Advanced Error Messages

Previously, we saw the most basic approach to improving error messages: .label and .explain. However, the other tools I listed in the post are valuable in their own right, but can be slightly less common. However, most of the time, their use is abstracted by other higher-level combinators, which will be the focus of this page. The API Guide page on Error Message Combinators covers the core combinators more thoroughly.

1 A Statement Language

In this section, we'll set the stage for the discussion of the error messages. We'll start by defining an evaluator for a langauge (as opposed to an AST), and then write a parser that generates partially evaluated expressions and collapse it to form an interpreter. The language supports conditionals, assignment, arithmetic, and boolean expressions; variables must be integers.

1.1 A Stateful Evaluator

The language will be evaluated by producing a value of type Eval[Unit], a monad that threads a variable environment through the program, handling any out-of-scope variables. Just for fun, I'll use the cats functional programming library to do this, as it makes it very easy to express. The environment will be carried around in a StateT state monad, and errors contains within an Either[String, _] type. Don't worry if you don't understand how it works; our main focus is obviously the parser:

import cats.syntax.all._
object eval {
    import cats.data.StateT
    import cats.Monad
    type Error[A] = Either[String, A]
    type Eval[A] = StateT[Error, Map[String, Int], A]

    def number(x: Int): Eval[Int] = Monad[Eval].pure(x)
    def bool(b: Boolean): Eval[Boolean] = Monad[Eval].pure(b)

    def negate(mx: Eval[Int]): Eval[Int] = mx.map(0 - _)
    def add(mx: Eval[Int], my: Eval[Int]): Eval[Int] = (mx, my).mapN(_ + _)
    def sub(mx: Eval[Int], my: Eval[Int]): Eval[Int] = (mx, my).mapN(_ - _)
    def mul(mx: Eval[Int], my: Eval[Int]): Eval[Int] = (mx, my).mapN(_ * _)

    def less(mx: Eval[Int], my: Eval[Int]): Eval[Boolean] = (mx, my).mapN(_ < _)
    def equal(mx: Eval[Int], my: Eval[Int]): Eval[Boolean] = (mx, my).mapN(_ == _)

    def and(mx: Eval[Boolean], my: Eval[Boolean]): Eval[Boolean] = cond(mx, my, bool(false))
    def or(mx: Eval[Boolean], my: Eval[Boolean]): Eval[Boolean] = cond(mx, bool(true), my)
    def not(mx: Eval[Boolean]): Eval[Boolean] = mx.map(!_)

    def ask(v: String): Eval[Int] =
        StateT.inspectF(_.get(v).toRight(s"variable $v out of scope"))
    def store(v: String, mx: Eval[Int]): Eval[Unit] =
        mx.flatMap(x => StateT.modify(_.updated(v, x)))

    def cond[A](b: Eval[Boolean], t: Eval[A], e: Eval[A]): Eval[A] =
        Monad[Eval].ifM(b)(t, e)
}

The ask and store operations above allow for interaction with the environment, and the other operations are just "lifting" the relevant operations into our monad. Just so you can see how this might be stitched together, here are some example programs:

import eval._
store("x", add(number(5), ask("v"))).runS(Map("v" -> 3))
// res0: Error[Map[String, Int]] = Right(Map(v -> 3, x -> 8))
store("x", add(number(5), ask("v"))).runS(Map.empty)
// res1: Error[Map[String, Int]] = Left(variable v out of scope)

List(
    store("v", number(3)),                // v = 3;
    store("x", add(number(5), ask("v"))), // x = 5 + v;
    cond(less(ask("x"), ask("v")),        // if x < v {
        store("v", ask("x")),             //   v = x } else {
        store("v", number(6))),           //   v = 6 };
).sequence.runS(Map.empty)
// res2: Either[String, Map[String, Int]] = Right(Map(v -> 6, x -> 8))

If we start off with an initial variable assignment of {v = 3}, then the Eval value representing x = 5 + v will finish with a final environment of {v = 3, x = 8}. If we don't provide an initial value to v, the evaluation errors with an "out of scope" error. The sequence method can be used to compose multiple statements in the langauge.

1.2 A Parser

We'll be using the same lexer as we've been accustomed to recently (with some extra keywords and operators), so let's see what the parser is like:

import parsley.Parsley.atomic
import lexer.implicits.implicitSymbol
import lexer.{number, identifier, fully}
import parsley.combinator.sepEndBy
import parsley.syntax.zipped._
import parsley.expr.{Prefix, InfixR, InfixL, precedence, Ops}

def infixN[A, B](p: Parsley[A])(op: Parsley[(A, A) => B]): Parsley[B] =
    (p, op, p).zipped((x, f, y) => f(x, y))

lazy val atom: Parsley[Eval[Int]] =
    "(" ~> expr <~ ")" | number.map(eval.number) | identifier.map(ask)
lazy val expr = precedence[Eval[Int]](atom)(
    Ops(Prefix)("negate" as negate),
    Ops(InfixL)("*" as mul),
    Ops(InfixL)("+" as add, "-" as sub))

lazy val blit: Parsley[Eval[Boolean]] = "true".as(bool(true)) | "false".as(bool(false))
lazy val comp: Parsley[Eval[Boolean]] = infixN(expr)("<".as(less) | "==".as(equal))
lazy val btom: Parsley[Eval[Boolean]] = atomic("(" ~> pred) <~ ")" | blit | comp
lazy val pred = precedence[Eval[Boolean]](btom)(
    Ops(Prefix)("not" as not),
    Ops(InfixR)("&&" as and),
    Ops(InfixR)("||" as or))

def braces[A](p: =>Parsley[A]) = "{" ~> p <~ "}"

lazy val asgnStmt: Parsley[Eval[Unit]] = (identifier, "=" ~> expr).zipped(store)
lazy val ifStmt: Parsley[Eval[Unit]] =
    ("if" ~> pred, braces(stmts), "else" ~> braces(stmts)).zipped(cond[Unit])
lazy val stmt: Parsley[Eval[Unit]] = asgnStmt | ifStmt
lazy val stmts: Parsley[Eval[Unit]] = sepEndBy(stmt, ";").map(_.sequence.void)

val parser = fully(stmts)

I'm not going to say too much about this, since all of the ideas have been covered in previous pages (and the Haskell example!). The evaluator and the parser can be stitched together to form an interpreter:

def interpret(input: String)(ctx: Map[String, Int]) = {
    parser.parse(input).toEither.flatMap(_.runS(ctx))
}

Let's run an example through it and get used to the syntax:

interpret(
    """x = 7;
      |if x < v && 5 < v {
      |  y = 1;
      |}
      |else {
      |  y = 0;
      |};
      |x = 0;
    """.stripMargin)(Map("v" -> 8))
// res3: Either[String, Map[String, Int]] = Right(Map(v -> 8, x -> 0, y -> 1))

The |s here are being used by Scala's stripMargin method to remove the leading whitespace on our multi-line strings. We can see that the syntax of this language makes use of semi-colons for delimiting, and if statements require a semi-colon after the else. In addition, no parentheses are required for the if, but braces are mandated. This syntax is a little unorthodox, which is great for us because it'll give us a lot of opportunities to test out our new tools! I've neglected to add any .labels or .explains here, but obviously there are plenty of opportunities.

1.3 Motivation

Let's start by seeing what happens if we accidentally write a ; before an else:

interpret("if true {}; else {};")(Map.empty)
// res4: Either[String, Map[String, Int]] = Left((line 1, column 11):
//   unexpected ";"
//   expected else
//   >if true {}; else {};
//              ^)

This is what we'd expect, since at this point we'd expect an else. We could detail this issue for the user with an .explain, and explain that semi-colons are not something that work in this position:

import parsley.errors.combinator._
lazy val ifStmt: Parsley[Eval[Unit]] =
    ( "if" ~> pred
    , braces(stmts)
    , "else".explain("semi-colons cannot be written after ifs") ~> braces(stmts)
    ).zipped(cond[Unit])

What effect will this have?

interpret("if true {}; else {};")(Map.empty)
// res5: Either[String, Map[String, Int]] = Left((line 1, column 11):
//   unexpected ";"
//   expected else
//   semi-colons cannot be written after ifs
//   >if true {}; else {};
//              ^)

Ok, this is better! But what if I wrote something else there instead?

interpret("if true {}a else {};")(Map.empty)
// res6: Either[String, Map[String, Int]] = Left((line 1, column 11):
//   unexpected "a"
//   expected else
//   semi-colons cannot be written after ifs
//   >if true {}a else {};
//              ^)

Ah, right, not so good anymore. We could go back and make the explain a bit more general, of course, but that means we've lost out on the helpful prompt to the user about our language's syntax. So, what can we do here? This is where the Verified Errors pattern comes into play.

2 Verified Errors

The problem with using explain to write about a specific instance of a problem is that there is no guarantee that the problem actually occurred. This is what we observed above, where an a was misreported as a semi-colon! The Verified Errors pattern tells us the following:

Parse bad input to verify that contextual obligations are met before raising an error.

Basically, if we want to report an error about a spurious semi-colon, we better check that it is actually there first! There are a few properties we'd expect of these errors and there are a few ways of formulating them. One way is using amend, hide, empty, and so on, but most of the time this pattern follows a regular enough structure that it has been encoded into a family of combinators in parsley.errors.patterns.VerifiedErrors:

import parsley.character.char
import parsley.errors.patterns.VerifiedErrors

val _semiCheck =
    char(';').verifiedExplain("semi-colons cannot be written between `if` and `else`")

lazy val ifStmt: Parsley[Eval[Unit]] =
    ( "if" ~> pred
    , braces(stmts)
    , ("else" | _semiCheck) ~> braces(stmts)
    ).zipped(cond[Unit])

Here, if we can't read an else, we immediately try to parse _semiCheck, which reads a semi-colon (and is careful to hide it from the errors, otherwise we might see expected else or ";"). If it succeeds, then we fail generate a reason for the failure. If it didn't successfully parse, nothing happens, because the contextual obligation for the error was not met. A parser like _semiCheck is known as an "error widget": prefix them with _ to make them more immediately identifiable.

With this we have:

interpret("if true {}; else {};")(Map.empty)
// res7: Either[String, Map[String, Int]] = Left((line 1, column 11):
//   unexpected ";"
//   expected else
//   semi-colons cannot be written between `if` and `else`
//   >if true {}; else {};
//              ^)

interpret("if true {}a else {};")(Map.empty)
// res8: Either[String, Map[String, Int]] = Left((line 1, column 11):
//   unexpected "a"
//   expected else
//   >if true {}a else {};
//              ^)

This is exactly what we wanted! So, where else can we apply this technique? Let's see what happens if we miss out a closing brace:

interpret("if true {} else {")(Map.empty)
// res9: Either[String, Map[String, Int]] = Left((line 1, column 18):
//   unexpected end of input
//   expected "}", identifier, or if
//   >if true {} else {
//                     ^)

We could, again, give the user a helping hand here, and point out that they have an unclosed something that they need to close. Again, we could start by using the .explain combinator directly, on the }. Let's see what effect this will have:

def braces[A](p: =>Parsley[A]) = "{" ~> p <~ "}".explain("unclosed `if` or `else`")

This will give us the more helpful error:

interpret("if true {} else {")(Map.empty)
// res10: Either[String, Map[String, Int]] = Left((line 1, column 18):
//   unexpected end of input
//   expected "}", identifier, or if
//   unclosed `if` or `else`
//   >if true {} else {
//                     ^)

This time, adding extra input won't cause a problem, so is this fine? Well, what about this input:

interpret(
    """if true {}
      |else {
      | x = 7a
      |}""".stripMargin)(Map.empty)
// res11: Either[String, Map[String, Int]] = Left((line 3, column 7):
//   unexpected "a"
//   expected ";", "}", *, +, -, or digit
//   unclosed `if` or `else`
//   >else {
//   > x = 7a
//          ^
//   >})

Argh! The else is closed this time, but since } is a valid continuation character we've triggered our explain message. Again, we can fix this by using a verified error (on eof)

import parsley.Parsley.eof
val _eofCheck = eof.verifiedExplain("unclosed `if` or `else`")
def braces[A](p: =>Parsley[A]) = "{" ~> p <~ ("}" | _eofCheck)

This time we've latched onto whether or not there is any input left at all. This will work fine!

interpret(
    """if true {}
      |else {
      | x = 7a
      |}""".stripMargin)(Map.empty)
// res12: Either[String, Map[String, Int]] = Left((line 3, column 7):
//   unexpected "a"
//   expected "}"
//   >else {
//   > x = 7a
//          ^
//   >})

interpret(
    """if true {}
      |else {
      | x = 7
      |""".stripMargin)(Map.empty)
// res13: Either[String, Map[String, Int]] = Left((line 4, column 1):
//   unexpected end of input
//   expected ";", "}", *, +, or -
//   unclosed `if` or `else`
//   > x = 7
//   >
//    ^)

Perfect 🙂. What now? Well, another area where the user might trip up is thinking that you can assign booleans to variables! Let's see what the errors are:

interpret("x = true")(Map.empty)
// res14: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected keyword true
//   expected "(", identifier, negate, or number
//   >x = true
//        ^^^^)

interpret("x = 10 < 9")(Map.empty)
// res15: Either[String, Map[String, Int]] = Left((line 1, column 8):
//   unexpected "<"
//   expected ";", *, +, -, or end of input
//   >x = 10 < 9
//           ^)

interpret("x = not true")(Map.empty)
// res16: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected keyword not
//   expected "(", identifier, negate, or number
//   >x = not true
//        ^^^)

Now, there is a cheap way of dealing with this and an expensive one. Let's start cheap and see what needs to be done and how effective it is. The first thing we can recognise is that we can special case not, true, and false using the same strategy as before. We can choose to attach a new widget to the expr inside the asgn, let's start by defining the widget (using some more generalised machinery):

import parsley.combinator.choice
val _boolCheck = choice(
        lexer.nonLexemeSymbol("true"),
        lexer.nonLexemeSymbol("false"),
        lexer.nonLexemeSymbol("not"),
    ).verifiedExplain("booleans cannot be assigned to variables")

The lexer.nonLexemeSymbol combinator here is allowing for the parsing of keywords without reading trailing whitespace, which would make our error messages wider. Now, we can add this to the asgn:

lazy val asgnStmt: Parsley[Eval[Unit]] =
    (identifier, "=" ~> (expr | _boolCheck)).zipped(store)

And now for the errors:

interpret("x = true")(Map.empty)
// res17: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected keyword true
//   expected "(", identifier, negate, or number
//   booleans cannot be assigned to variables
//   >x = true
//        ^^^^)

interpret("x = 10 < 9")(Map.empty)
// res18: Either[String, Map[String, Int]] = Left((line 1, column 8):
//   unexpected "<"
//   expected ";", *, +, -, or end of input
//   >x = 10 < 9
//           ^)

interpret("x = not true")(Map.empty)
// res19: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected keyword not
//   expected "(", identifier, negate, or number
//   booleans cannot be assigned to variables
//   >x = not true
//        ^^^)

So, we've cracked the leading edge of the booleans, but we are still no closer to managing to deal with <, ==. These are significantly trickier to handle with our current approach, because they occur after some input has already been read. We would need to insert them as alternatives at every point where they could be "valid" predictions. This is far from ideal. Note that we don't really need to worry about && and ||, since we are going to have to have already found one of <, ==, not, true, or false before we reach it anyway. Instead, we need to look towards a different technique, the Preventative Error.

3 Preventative Errors

The problem with verified errors is that they have to follow the valid alternatives of a parse, which may be disparate and scattered. Not to mention that moving the widget to a shared parser may actually invalidate its context anyway! Instead, a preventative error allows us to proactively rule out bad input. This means trying to parse it before trying the "working" alternatives. Compared with verified errors, which are a last resort, preventative errors are by nature more costly. Again, there are a few properties we'd expect of these errors and there are a few ways of formulating them and parsley.errors.patterns.PreventativeErrors contains a family of combinators for the most common formulation:

import parsley.errors.patterns.PreventativeErrors

val _noCompCheck =
    choice(lexer.nonLexemeSymbol("<"), lexer.nonLexemeSymbol("==")).preventativeExplain(
        reason = "booleans cannot be assigned to variables",
        labels = "end of arithmetic expression"
    )

The idea here is to try and parse one of < or == and fail immediately if this is possible, otherwise, the combinator succeeds, and we can start parsing something else. Because raising the error happens more eagerly, it is important to provide error labels to describe what valid things can come next since these parsers won't actually get executed and contribute themselves! It should be placed at a point where it is suspected the < and == could be read, namely after an expression in the asgn. So, let's see what the effect of the error will be:

lazy val asgnStmt: Parsley[Eval[Unit]] =
    (identifier, "=" ~> ((expr <~ _noCompCheck) | _boolCheck)).zipped(store)

interpret("x = 10 < 9")(Map.empty)
// res20: Either[String, Map[String, Int]] = Left((line 1, column 8):
//   unexpected "<"
//   expected end of arithmetic expression
//   booleans cannot be assigned to variables
//   >x = 10 < 9
//           ^)

This looks pretty good! However, it's now looking a little messier, and could do with some abstraction:

def _noBool[A](p: Parsley[A]): Parsley[A] = (p <~ _noCompCheck) | _boolCheck

lazy val asgnStmt: Parsley[Eval[Unit]] = (identifier, "=" ~> _noBool(expr)).zipped(store)

This is a bit tidier! When I said that there was a cheap way and an expensive way of implementing these improved errors earlier, the expensive way would have been using a preventative error from the outset, trying to parse an entire boolean expression up-front. This may be much more expensive than the reasonably small <7 character checks we've been doing. However, it is instructive to see what this might have looked like, now just by changing _noBool:

import parsley.errors.VanillaGen

val _noBoolCheck = pred.void.preventWith(
    err = new VanillaGen[Unit] {
        override def reason(x: Unit) = Some("booleans cannot be assigned to variables")
        override def unexpected(x: Unit) = VanillaGen.NamedItem("boolean expression")
    },
    labels = "arithmetic expression"
)


def _noBool[A](p: Parsley[A]): Parsley[A] = _noBoolCheck ~> p

This version makes use of the VanillaGen, which can allow for a bit more fine-grained configuration, including changing the unexpected message within the error. By parsing a full pred, we are certain to rule out all problematic inputs immediately, but this could be much more costly - there is a trade-off to be had! The errors are all as follows:

interpret("x = true")(Map.empty)
// res21: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected boolean expression
//   expected arithmetic expression
//   booleans cannot be assigned to variables
//   >x = true
//        ^^^^)

interpret("x = 10 < 9")(Map.empty)
// res22: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected boolean expression
//   expected arithmetic expression
//   booleans cannot be assigned to variables
//   >x = 10 < 9
//        ^^^^^^)

interpret("x = not true")(Map.empty)
// res23: Either[String, Map[String, Int]] = Left((line 1, column 5):
//   unexpected boolean expression
//   expected arithmetic expression
//   booleans cannot be assigned to variables
//   >x = not true
//        ^^^^^^^^)

The flip side is that these errors are just about as good as it gets for this particular problem: notice that the error caret covers the entire bad expression, and reports it under a concise naming. It can also catch errors with bracketed expressions, which may be missed by the previous implementations (though it is still possible to catch these with some work).

4 Conclusion

The main point of this page is to demonstrate how much of an important technique "parsing bad inputs" can be to generating informative and precise error messages. Of course, there are other formulations of these patterns, and they can address some of the shortcomings of the above implementations. In future, I may add a discussion of how these patterns are implemented in practice, to show how more bespoke ones can be useful.