Detecting Left Recursion

Left recursion is an issue that plagues recursive-descent parsing. Whenever a parser (eventually) performs itself as the first thing it does, this will infinite loop and the parser will not terminate. While parsley has been designed in a way that aims to statically rule out the more obvious left-recursive definitions, it is still possible to indirectly write these parsers. The parsley-debug library provides functionality to help track down the sources of such issues at runtime when provided with an input known to loop.

Please see the @parsley.debuggable page first, as this is used to provide meaningful names for this functionality.

The combinator we are interested in is called parsley.debug.combinator.detectDivergence, which wraps the top-level parser.

import parsley.quick.*
//import parsley.debuggable
import parsley.debug.combinator.detectDivergence

@debuggable
object parsers {
    lazy val top: Parsley[Unit] = p.void // simplifies the cycle, see #247
    lazy val p: Parsley[Unit] = char('a').void | s ~> q
    lazy val q: Parsley[Unit] = sepBy1(r, string(",")).void
    lazy val r: Parsley[Unit] = many(p).void
    lazy val s: Parsley[Unit] = unit
}

While parsers.top.parse("a") would work fine, consider what happens if we pass the empty string:

detectDivergence(parsers.top).parse("")
// parsley.exceptions.ParsleyException: 
// Left-recursion has been detected in the given parser. The trace is as follows:
// 
// p
// q
// r
// p
// 
// For readability, all non-named combinators have been stripped out -- to see more,
// use the `named` combinator to tag parts of the parser you want to see appear in
// the trace.
// 
// 	at repl.MdocSession$MdocApp$$anonfun$1.apply(recursion-detection.md:40)
// 	at repl.MdocSession$MdocApp$$anonfun$1.apply(recursion-detection.md:40)

Thankfully, the detectDivergence combinator can detect that this will cause an issue, and provides some insight into what that might be. If you follow that trace out carefully, you'll see that these four calls form a cycle where no input is consumed before you get back to the start again: this is what causes left-recursion.

Looping vs Recursion

Divergence is a property that is more general than just left-recursion. In fact, some combinators can more accurately iteratively "loop" forever. One such case is as follows:

detectDivergence(many(unit)).parse("")
// parsley.exceptions.ParsleyException: 
// `many` is looping unproductively as `unit` can succeed having not consumed input.
// 
// More precise names for the body and the loop can be sourced using @parsley.debuggable
// or the `named` combinator.
// 	at repl.MdocSession$MdocApp$$anonfun$2.apply(recursion-detection.md:50)
// 	at repl.MdocSession$MdocApp$$anonfun$2.apply(recursion-detection.md:50)

The many combinator reads zero-or-more things. When given something that doesn't consume input, it will succeed reading that parser, then loop back round and try to read it again. This is, as you might expect, exactly where it left off last iteration. As such, detectDivergence can also detect this kind of looping as well. While parsley implements many and friends more efficiently, in theory this is still left-recursion, it is just called hidden left-recursion:

def many(p: Parsley[A]): Parsley[List[A]] = {
    lazy val rec: Parsley[List[A]] = p <::> rec | pure(Nil)
    rec
}

When p does not succeed having consumed input, rec will perform itself again from the same input point -- this is left-recursion, even if p is "in the way".

Stateful Parsers

It is "easy" to detect that a parser is looping if the only state in the parser is the input -- if we get back to the same place with no input consumed we will most certainly repeat that forever. However, parsley.state gives us access to mutable pieces of state threaded through a parser. This adds an additional complication that makes static analysis of parser divergence more intractible. However, the detectDivergence combinator is a runtime analysis, and so has access to the state to track what's going on. This means it is able to detect more complex looping that involves Ref:

import parsley.quick.*
//import parsley.debuggable
import parsley.debug.combinator.detectDivergence
import parsley.state.*

@debuggable
object stateful {
    val toggle = Ref.make[Boolean]
    val counter = Ref.make[Int]
    lazy val p: Parsley[Unit] = counter.update(_ + 1) ~> ifS(counter.gets(_ == 10), counter.set(0) ~> q, p)
    lazy val q: Parsley[Unit] = toggle.update(!_) ~> p
    lazy val top = toggle.set(false) ~> counter.set(0) ~> p
}

This parser is a contrived example, admittedly, but illustrates what detectDivergence is capable of. The parser top runs in the presence of two Refs: p will loop until the counter reaches 10, when it will reset it and execute q one time. q flips the toggle, but then does p. With some thought, you might be able to identify the cycle that will happen here:

detectDivergence(stateful.top).parse("")
// parsley.exceptions.ParsleyException: 
// Left-recursion has been detected in the given parser. The trace is as follows:
// 
// p (with state (0, false))
// p (with state (1, false))
// p (with state (2, false))
// p (with state (3, false))
// p (with state (4, false))
// p (with state (5, false))
// p (with state (6, false))
// p (with state (7, false))
// p (with state (8, false))
// p (with state (9, false))
// q (with state (0, false))
// p (with state (0, true))
// p (with state (1, true))
// p (with state (2, true))
// p (with state (3, true))
// p (with state (4, true))
// p (with state (5, true))
// p (with state (6, true))
// p (with state (7, true))
// p (with state (8, true))
// p (with state (9, true))
// q (with state (0, true))
// p (with state (0, false))
// 
// For readability, all non-named combinators have been stripped out -- to see more,
// use the `named` combinator to tag parts of the parser you want to see appear in
// the trace.
// 
// 	at repl.MdocSession$MdocApp$$anonfun$3.apply(recursion-detection.md:88)
// 	at repl.MdocSession$MdocApp$$anonfun$3.apply(recursion-detection.md:88)

Unfortunately, at the moment there is no way to name the references, but this is future work. Regardless, this time, the cycle shows the state snapshots that lead to the loop in the parser; namely, when the counter hits 0 again and the toggle is reset to false, we enter our initial state, and the parser is killed.

Obviously, if this were just an infinite loop of the form lazy val p: Parsley[Unit] = counter.update(_ + 1) ~> p, then detectDivergence cannot detect this until counter overflows! The system isn't fool proof, obviously, since this would mean solving the general halting problem, but it works for any non-stateful parser and stateful parsers which repeat the same stateful behaviour.

Limitations

As you may have noticed in the above example with many, sometimes there may be a lack of information about the naming of certain parsers (which can help pin-point where a problem arose from). Obviously, the @parsley.debuggable annotation is meant to help with this, but it does not necessarily have visibility over all parsers! Let's see where this might go wrong:

@debuggable
object nameless {
    val top = false.makeRef { r =>
        lazy val p: Parsley[Unit] = r.update(!_) ~> p
        p
    }
}

While we might expect this to work out much like the stateful example above, in reality we see this:

detectDivergence(nameless.top).parse("")
// parsley.exceptions.ParsleyException: 
// Left-recursion has been detected in the given parser; however, there is not
// enough information to determine the cycle. To get the full cycle diagnostic,
// please use the `parsley.debuggable` annotation to populate the name information.
// 
// For example, if your parsers are exposed (publically) in an object called
// `foo`, you should write:
// 
// > @parsley.debuggable object foo { ... }
// 
// Alternatively, you can give individual parser fragments names by using the
// `named` combinator, which will cause them to appear along the path.
// 
// 	at repl.MdocSession$MdocApp$$anonfun$5.apply(recursion-detection.md:116)
// 	at repl.MdocSession$MdocApp$$anonfun$5.apply(recursion-detection.md:116)

Unfortunately, the annotation cannot see into nested definitions, only top-level ones. In this case, we can use the named combinator from parsley.debug.combinator to explicitly name p (via named(p, "p")), or you may be able to get away with something like:

@debuggable
class Nested(r: Ref[Boolean]) {
    lazy val p: Parsley[Unit] = r.update(!_) ~> p
}

@debuggable
object nameful {
    val top = false.makeRef(r => new Nested(r).p)
}

However, this is remarkably clumsy to use in practice (please don't)!