We have many tools in our parsing toolbox. Today let’s look at how three different parsing techniques handle ambiguity caused by choice.
First, yacc. yacc handles ambiguities in your grammar with, if you’re lucky, complaints about shift/reduce conflicts. If you’re not so lucky you’ll get reduce/reduce conflicts. Either way, you will spend ages trying to figure out (a) how the conflict arose and (b) how to beat your grammar into the shape that yacc demands. In other words, yacc forces you to resolve any ambiguities yourself, through the application of pain. Life is too short.
Second, PEG parsers. We’ve seen these before on this blog. PEGs are, by definition, unambiguous. “By definition” because PEGs don’t use BNF-style “or” constructs, but prioritised choice. That means that given
a / b
, either
a
will succeed or if it fails then
b
will get a chance at parsing the stream.
The problem is that the grammar you define might not be the grammar you meant to define. Suppose you have a grammar
expr = "foo" | "foobar" | "bar"
and you parse
foobar
. You might think that the result of that parse would be the greedy match,
"foobar"
. Alas, not.
"foo"
wil succeed, leaving
"bar"
on the stream. The
"foobar"
rule will not have a chance to run. So the general rule is to make the greediest rules run first. In a highly factored grammar it might not be obvious how to do this!
And finally, the new kid on the block. When you parse with derivatives, you create parse trees for all possible parse trees. Given the toy grammar above, you will thus have
{"foo", "foobar"}
as the set of possible parses. So again, there’s ambiguity, and you have to solve it yourself. Or not, of course: you might want all possible parses because you really are parsing something ambiguous, or maybe you don’t care and just want the first parse.