In my introduction to parser combinators, we developed the following module for generating common combinators and used them to construct a simple parser for addition expressions. I give all the code in full here, since in the preceding articles it appears in incremental chunks with various implementations of the same method given, and if we’re going to discuss performance we need to be unambiguous about what code we’re using.

```
module ParserCombinators
State = Struct.new(:string, :offset) do
def peek(n)
string[offset ... offset + n]
end
def read(n)
State.new(string, offset + n)
end
def complete?
offset == string.size
end
end
def parse(string)
node, state = root.call(State.new(string, 0))
if state and state.complete?
node
end
end
def str(string, &action)
-> state {
chunk = state.peek(string.size)
if chunk == string
node = [:str, chunk]
node = action.call(node) if action
[node, state.read(string.size)]
end
}
end
def chr(pattern, &action)
-> state {
chunk = state.peek(1)
if chunk =~ %r{[#{pattern}]}
node = [:chr, chunk]
node = action.call(node) if action
[node, state.read(1)]
end
}
end
def seq(*parsers, &action)
-> state {
matches = []
parsers.each do |parser|
node, state = state && parser.call(state)
matches << node if state
end
if state
node = [:seq, *matches]
node = action.call(node) if action
[node, state]
end
}
end
def rep(parser, n, &action)
-> state {
matches = []
last_state = nil
while state
last_state = state
node, state = parser.call(state)
matches << node if state
end
if matches.size >= n
node = [:rep, *matches]
node = action.call(action) if action
[node, last_state]
end
}
end
def alt(*parsers, &action)
-> state {
parsers.each do |parser|
node, new_state = parser.call(state)
if new_state
node = action.call(node) if action
return [node, new_state]
end
end
nil
}
end
def ref(name)
-> state {
__send__(name).call(state)
}
end
end
```

Then we went on to look at precedence and associativity and developed the following richer language for composing addition and multiplication. Although our previous implementation included blocks for transforming the original syntax nodes, I am omitting this for brevity and because it does not have a major impact on performance.

```
class Arithmetic
include ParserCombinators
def root
expression
end
def expression
alt(ref(:addition), ref(:term))
end
def addition
binary(:term, str("+"))
end
def term
alt(ref(:multiplication), ref(:factor))
end
def multiplication
binary(:factor, str("*"))
end
def factor
alt(ref(:number), ref(:paren_expression))
end
def number
alt(str("0"), seq(chr("1-9"), rep(chr("0-9"), 0)))
end
def paren_expression
seq(str("("), ref(:_), ref(:expression), ref(:_), str(")"))
end
def binary(arg, op)
seq(ref(arg), rep(seq(ref(:_), op, ref(:_), ref(arg)), 1))
end
def _
rep(str(" "), 0)
end
end
```

Now, this parser may be correct in that it recognises and parses the strings we
give to it into the trees we expect. However, to be practical we generally want
parsers to be fast, which usually means the time they take to run is
proportional to the length of the input. We say such a parser *runs in linear
time*. A naive recursive descent parser typically won’t do that, for several
reasons. The most common are that, every time it has to make a choice it tries
every alternative, producing a large tree of possible code paths, or, that it
repeatedly tries to match the same rule at the same offset, effectively
recomputing a result that’s already known. Both these effects contribute to
simple implementations running in *exponential time* on the length of the input.

Mitigating the first kind of problem, that of spending too long finding out which choice to make, can be done using predictive parsing. The parser compares the next character of input to the first element of each alternative, only pursuing those alternatives it knows to begin with that character. Unfortunately, this works best when each alternative expression begins with a terminal – a string literal, rather than a reference to another rule – and this means our grammar must be restructured into a less natural form. It also restricts the set of languages such parsers are able to recognise.

However, the second kind of problem, that of performing redundant work for
already-known results can be verily easily fixed with some simple and localised
changes to our implementation. All that is required is for the results of each
parsing rule to be cached, so they are not re-computed. This technique is called
*memoisation*.

Fortunately, we can make some very simple and localised changes to our
implementation to stop such things happening. Each use of a rule in the grammar
is implemented by a called to `ref(name)`

, and that call gives us an opportunity
to insert a cache based on the name of the rule we’re invoking.

In our implementation, calling a rule is actually a two-stage process. In `ref`

,
we use `__send__(name)`

to invoke the method for a rule, and this uses the
methods in `ParserCombinators`

to construct a parser, which is essentially a set
of nested lambdas. Then, we call `call(state)`

on the result of the first call
to have the newly minted parser process the input. Our first memoisation
improvement comes from recognising that the result of the first call – the
construction of the parsing expression – will always be the same, since the
rules don’t change between invocations. Therefore we can keep a cache of these
parsing expressions and have `ref`

populate the cache the first time it
encounters each rule.

```
class RefCachedArithmetic
def ref(name)
@ref_cache ||= {}
-> state {
@ref_cache[name] ||= __send__(name)
@ref_cache[name].call(state)
}
end
end
```

To test this, we’ll try parsing the following randomly generated expression:

```
399 + 422 * (778 * (851 * 867 * 454 * 599 + 408 * 300 +
988 * 610 * 164 * 315 + 930 * 14) * (608 + ((849))) *
(81) + 102 * 768 + (50) + 967 * 263 + 251) + (744)
```

I’m using the benchmark-ips library to measure how many times per second each parser can execute:

```
Benchmark.ips do |bm|
bm.report "combinators" do
Arithmetic.new.parse(expression)
end
bm.report "ref cache" do
RefCachedArithmetic.new.parse(expression)
end
end
```

The results show that just caching the generated parsers gives roughly a 50% performance improvement over the original implementation, on this input.

```
combinators 213.980 (± 3.3%) i/s - 1.080k in 5.052557s
ref cache 333.491 (± 4.8%) i/s - 1.664k in 5.001283s
```

This parser-caching trick is only a partial solution though. The second step in
`ref`

is to call the generated parser with our input, and this is where much of
the redundant work happens in these parsers. Because of the way we need to
structure grammars to express precedence and repetition, we end up trying the
same rule at the same position multiple times.

Say we’re given just the input `42`

to parse. We begin in the `expression`

rule,
which tries to match `addition / term`

. We try `addition`

first, which has the
form `term (_ "+" _ term)+`

, so the first thing `addition`

tries to match is a
`term`

. A `term`

is `multiplication / factor`

so we try `multiplication`

first,
which is `factor (_ "*" _ factor)+`

, so now we try to match `factor`

. `factor`

is `number / paren_expression`

so we try `number`

first, and finally we reach a
rule with terminal expressions. To reach this point we have the following rules
on the stack:

```
expression <- addition / term
addition <- term (_ "+" _ term)+
term <- multiplication / factor
multiplication <- factor (_ "*" _ factor)+
factor <- number / paren_expression
number <- "0" / [1-9] [0-9]*
```

`number`

matches the input `42`

at offset `0`

and moves the offset to `2`

, the
end of the string. Since `number`

succeeded, we pop it off the stack and also
`factor`

which called it, leaving us partway through the `multiplication`

expression:

```
expression <- addition / term
addition <- term (_ "+" _ term)+
term <- multiplication / factor
multiplication <- factor (_ "*" _ factor)+
^
```

`multiplication`

now wants to see zero or more spaces followed by a `*`

character, but we have reached the end of the input and so no further matching
is possible. Therefore `multiplication`

fails and we pop back up to the next
alternative in `term`

:

```
expression <- addition / term
addition <- term (_ "+" _ term)+
term <- multiplication / factor
^
```

`term`

began by attempting to match `multiplication`

at offset `0`

, so it now
tries to match `factor`

at the same position. But we’ve already done this work
when we tried to match `multiplication`

: we tried `factor`

which in turn tried
`number`

which yielded a match and moved the offset to the end of the input. So
we know `factor`

succeeds here, and so we can pop `term`

off the stack and
return to trying to match `addition`

:

```
expression <- addition / term
addition <- term (_ "+" _ term)+
^
```

Now the same thing happens with `addition`

that we saw in `multiplication`

: the
rule expects more characters but we’ve reached the end of the input. So,
`addition`

fails and we try the other alternative in `expression`

:

```
expression <- addition / term
^
```

We’re left trying to match `term`

at position `0`

, and again this is work we’ve
already done: `term`

tries to match `multiplication`

, which tries `factor`

,
which tries `number`

, which matches, and so `factor`

succeeds, then we pop back
up into `multiplication`

and find it fails on the next character, so we go back
to `term`

’s next alternative, which is `factor`

, which tries `number`

, which
succeeds, and now we can unwind all the way back to the top and discover that
`expression`

matched a `number`

.

All in all, we’ve called `number`

four times in order to match the input `42`

.
Each call to `term`

resulted in `factor`

being called twice: once as the first
item in `multiplication`

and once as the second alternative in `term`

. Calling
`expression`

similarly resulted in calling `term`

twice. We can visualise the
above using the following call tree:

```
expression
/ \
/ \
/ \
/ \
addition term
/ / \
term multiplication factor
/ \ / \
multiplication factor factor number
/ \ /
factor number number
/
number
```

Rule structures like this, where one alternative is tried, followed by a fallback with the same beginning item, are very common in programming languages and this call pattern results in the parser taking an exponentially longer amount of time as the grammar grows in structure.

Although this seems like a terrible situation, the retrying of the same rule at
the same offset provides an excellent opportunity to cache things. We can
augment our `ref`

function with a cache based on the rule name and input offset,
such that the named parsing function is only invoked if the cache does not
already contain an entry for the current rule and offset. Since each parsing
function returns both a parse tree and the new state, the cached values will
also make the parser continue from the correct offset when they are used.

```
class NodeCachedArithmetic
def ref(name)
@node_cache ||= {}
@ref_cache ||= {}
-> state {
key = [name, state.offset]
unless @node_cache.has_key?(key)
@ref_cache[name] ||= __send__(name)
@node_cache[key] = @ref_cache[name].call(state)
end
@node_cache[key]
}
end
end
```

Now, what happens when we parse `42`

is much simpler. As before `expression`

tries `addition`

, which tries `term`

, which tries `multiplication`

, which tries
`factor`

, which calls `number`

, and that succeeds. So we store a cache result
for `(number, 0)`

. `factor`

has succeeded so we also store the same result in
the cache for `(factor, 0)`

. Then `multiplication`

fails and `term`

tries
`factor`

, but now the result for `factor`

can be plucked out of the cache. This
in turn makes `term`

succeed so we put the same result in for `(term, 0)`

and so
on up the stack. When `addition`

fails and we fall back on `term`

this is again
just a cache lookup away. This eliminates most of the function calls in the
original implementation, since no function is ever called twice at the same
offset:

```
expression
/
addition
/
term
/
multiplication
/
factor
/
number
```

This modification makes a substantial improvement to performance on our test input:

```
combinators 213.625 (± 4.7%) i/s - 1.080k in 5.066901s
ref cache 343.193 (± 2.9%) i/s - 1.736k in 5.062538s
node cache 518.276 (± 3.3%) i/s - 2.600k in 5.022265s
```

In general, this simple caching modification means inputs can be parsed in
linear time with arbitrary amounts of lookahead. The technique is known as
*packrat* parsing, and is explained in a brief paper by Bryan Ford, the
inventor of parsing expression grammars, and explored more fully in his
master’s thesis.

While this makes a big improvement to how the runtime of the parser scales with the input size, the method of implementation itself can still be changed to yield large constant-factor improvements. For example, we can take our grammar and transcribe it using a parser generator, which will generate code for us. I maintain such a tool, called Canopy, in which the grammar would be expressed as follows:

```
grammar CanopyArithmetic
expression <- addition / term
addition <- term (_ "+" _ term)+
term <- multiplication / factor
multiplication <- factor (_ "*" _ factor)+
factor <- number / paren_expression
number <- "0" / [1-9] [0-9]*
paren_expression <- "(" _ expression _ ")"
_ <- " "*
```

Canopy can convert this grammar into a Ruby class, and we can benchmark that against our previous implementations:

```
combinators 211.650 (± 4.3%) i/s - 1.064k in 5.036935s
ref cache 331.979 (± 4.2%) i/s - 1.683k in 5.078639s
node cache 509.117 (± 3.7%) i/s - 2.550k in 5.015619s
canopy 1.546k (± 4.7%) i/s - 7.752k in 5.024497s
```

Canopy makes roughly a 200% improvement over our previous best attempt. Although in an abstract sense it’s doing the exact same computation as our combinator-based implementation, it avoids a lot of the cost the combinators carry in Ruby and some other languages. To run all those nested lambdas means making a lot of function calls, which can be expensive since a stack frame must be allocated and control must jump to the new function and return again. In contrast, Canopy generates all the code for implementing each parsing expression inline within each parsing function. This results in a lot of verbose impenetrable code, but it runs much faster.

Still, despite these improvements, there is still much room for the parser
writer to affect things. For example, rather than implementing an if-statement
with optional `else`

as:

```
if_statement <- "if" expression "then" expression "else" expression
/ "if" expression "then" expression
```

We might want to avoid re-matching the `if`

and `then`

by putting the whole
shared section in its own rule, to take advantage of caching:

```
if_statement <- if_then "else" expression
/ if_then
if_then <- "if" expression "then" expression
```

Or we could use the `?`

operator to match the `else`

section if present, without
having to re-match the first section at all.

```
if_statement <- "if" expression "then" expression ("else" expression)?
```

There are usually many ways of expressing the same language, and it’s up to the parser writer to make an acceptable trade-off between the clarity of the grammar, the resulting parse tree, and the runtime performance given their chosen parsing strategy. Indeed sometimes, caching does not actually help on common inputs – the cache lookup itself comes with a cost and that can outweigh the benefit if your grammar does not often retry rules at the same offsets. The parser generator PEG.js has an option to turn off caching in its generated parsers for this reason.

Still, despite the performance problems, parser combinators, PEG generators and other top-down techniques remain a great tool for prototyping since a naive but correct parser can be implemented with relatively little machinery. Expressing the grammar using declarative rules makes it easier to switch to a different tool later without having to rewrite too much code.