Customising Error Messages

Previously, in Effective Lexing we saw how we could extend our parsers to handle whitespace and lexing. In this wiki post we'll finally address error messages. Thoughout all the other entries in this series I have neglected to talk about error messages at all, but they are a very important part of a parser.

1 Adjusting Error Content

I'm going to start with the parser from last time, but before we introduced the Lexer class. The reason for this is that the Lexer functionality has error messages baked into it, which means this post would be even shorter! It's not perfect, however, but it does make some good error messages for your basic lexemes. There is nothing stopping you from using the techniques here to change those messages if you wish though. Simply put: the original grammar has more room for exploration for us.

import parsley.Parsley, Parsley.atomic

object lexer {
    import parsley.Parsley.{eof, many}
    import parsley.character.{digit, whitespace, string, item, endOfLine}
    import parsley.combinator.manyTill

    private def symbol(str: String) = atomic(string(str)).void
    private implicit def implicitSymbol(tok: String) = symbol(tok)

    private val lineComment = "//" ~> manyTill(item, endOfLine).void
    private val multiComment = "/*" ~> manyTill(item, "*/").void
    private val comment = lineComment | multiComment
    private val skipWhitespace = many(whitespace.void | comment).void

    private def lexeme[A](p: =>Parsley[A]) = p <~ skipWhitespace
    private def token[A](p: =>Parsley[A]) = lexeme(atomic(p))
    def fully[A](p: =>Parsley[A]): Parsley[A] = skipWhitespace ~> p <~ eof

    val number: Parsley[BigInt] =
        token(digit.foldLeft1[BigInt](0)((n, d) => n * 10 + d.asDigit))

    object implicits {
        implicit def implicitSymbol(s: String): Parsley[Unit] = lexeme(symbol(s))
    }
}

object expressions {
    import parsley.expr.{precedence, Ops, InfixL}

    import lexer.implicits.implicitSymbol
    import lexer.{number, fully}

    private lazy val atom: Parsley[BigInt] = "(" ~> expr <~ ")" | number
    private lazy val expr = precedence[BigInt](atom)(
        Ops(InfixL)("*" as (_ * _)),
        Ops(InfixL)("+" as (_ + _), "-" as (_ - _)))

    val parser = fully(expr)
    def parse(input: String) = parser.parse(input)
}

So, before, we saw how this ran on succesful cases. Let's now start to see how it works on bad input.

expressions.parse("5d")
// res0: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "d"
//   expected "*", "+", "-", "/*", "//", digit, end of input, or whitespace
//   >5d
//     ^)

Let's start by breaking this error down first and understanding what the components of it are and why this information has appeared. The first line of the error reports the line and column number of the message (in Parsley hard tabs are treated as aligning to the nearest 4th column). If you are using parseFromFile then this will also display the filename. The last two lines always show the location at which the error occured. This is going to be the point at which the error that eventually ended up being reported occured, not necessarily where the parser ended up. This can be improved in the future. Next you can see the unexpected and expected clauses. The unexpected "d" here is telling us roughly what we already knew. The expected clause on the other hand tells us all the things we could have used to fix our problem. There is definitely a lot of noise here though.

First let's just make sure we understand where each of these alternatives came from. Firstly, it's clear that since the last thing we read was a 5, a good way of carrying on would be reading another digit to make the number bigger. We could also read a space or start a comment as a way of making more progress too. Of course, another way we could make progress would have been using one of the operators and in the process continued our expression. Finally we could simply remove the d and it would run perfectly fine. Notice how ( and ) are not suggested as alternatives despite appearing in the parser: 5( or 5) makes no sense either. As another small example, let's see what happens to the error if we add a space between the 5 and the d.

expressions.parse("5 d")
// res1: parsley.Result[String, BigInt] = Failure((line 1, column 3):
//   unexpected "d"
//   expected "*", "+", "-", "/*", "//", end of input, or whitespace
//   >5 d
//      ^)

Neat, so this time round digit is no longer a valid alternative: clearly the number has come to an end because we wrote a space. But the other possibilities from before are still valid. So, how can we start making improvements? There are seven core combinators available to us for this purpose:

All of these can be found in the parsley.errors.combinator module.

1.1 Using label

From this section, we are only going to be using label and hide, as they are by far the most useful and effective of the five methods. That being said, explain can be very useful, but we'll find there are no compelling use-cases for it in this example. Let's start off by giving a label to comment and see what happens:

import parsley.errors.combinator._

val comment = (lineComment | multiComment).label("comment")

Now let's run our parser from before:

expressions.parse("5d")
// res3: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "d"
//   expected "*", "+", "-", comment, digit, end of input, or whitespace
//   >5d
//     ^)

Nice! So, if you compare the two, you'll notice that "/*" and "//" both disappeared from the message, but comment was added. You can tell when label is being used because there are not quotes surrounding the items. Knowing this, you can probably guess that digit, eof, and whitespace all have error labels of their own.

1.1.1 Using hide to trim away junk

This is a good start, but normally we might say that whitespace suggestions in an error message are normally just noise: of course we expect to be able to write whitespace in places, it's not usually the solution to someone's problem. This makes it a good candidate for the hide combinator:

import parsley.errors.combinator._

val skipWhitespace = many(whitespace.void | comment).void.hide

Now let's check again:

expressions.parse("5d")
// res4: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "d"
//   expected "*", "+", "-", digit, or end of input
//   >5d
//     ^)

Great! The hide combinator has removed the information from the error message, and now it's looking a lot cleaner. But what if we started writing a comment, what would happen then?

expressions.parse("5/*")
// res5: parsley.Result[String, BigInt] = Failure((line 1, column 4):
//   unexpected end of input
//   expected "*/" or any character
//   >5/*
//       ^)

So, as I mentioned earlier, hide is just a label, and label will not relabel something if it fails and consumes input. That means, by opening our comment but not finishing it, we can see some different suggestions. In this case, end of input is not allowed, and any character will work to extend the comment, but clearly */ is a way to properly end it. Let's add a label to that, however, to make it a bit friendlier:

val lineComment = "//" *> manyTill(item, endOfLine.label("end of comment")).void
val multiComment = "/*" *> manyTill(item, "*/".label("end of comment")).void

Now we get a more informative error message of:

expressions.parse("5/*")
// res6: parsley.Result[String, BigInt] = Failure((line 1, column 4):
//   unexpected end of input
//   expected any character or end of comment
//   >5/*
//       ^)

Great! Now let's turn our attention back to expressions and not whitespace.

1.1.2 Labelling our numbers

Let's take a look at a very simple bad input and see how we can improve on it:

expressions.parse("d")
// res7: parsley.Result[String, BigInt] = Failure((line 1, column 1):
//   unexpected "d"
//   expected "(" or digit
//   >d
//    ^)

So this time, we can see two possible ways of resolving this error are opening brackets, or a digit. Now digit is really a poor name here, what we really mean is integer or number:

val number =
    token(digit.foldLeft1[BigInt](0)((n, d) => n * 10 + d.asDigit)).label("number")

Now we get, the following, nicer error message:

expressions.parse("d")
// res8: parsley.Result[String, BigInt] = Failure((line 1, column 1):
//   unexpected "d"
//   expected "(" or number
//   >d
//    ^)

expressions.parse("5x")
// res9: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "x"
//   expected "*", "+", "-", digit, or end of input
//   >5x
//     ^)

But notice in the second error message, again we have been given digit and not number as our alternative. This is good, once we've started reading a number by reading 5 it would be inappropriate to suggest a number as a good next step. But digit here is not particularly descriptive and we can do better still:

val number =
    token(
        digit.label("end of number").foldLeft1[BigInt](0)((n, d) => n * 10 + d.asDigit)
    ).label("number")

This gives us, again, a much nicer message:

expressions.parse("5x")
// res10: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "x"
//   expected "*", "+", "-", end of input, or end of number
//   >5x
//     ^)

1.1.3 Merging multiple labels

With an example grammar as small as this, I think we are almost done here! The last thing we could improve is the repetition of "*", "+", and "-". Really, we know that there is nothing special about any of them individually, so we could more concisely replace this them with arithmetic operator, or since we only have arithmetic operators here operator will do. we don't need to do anything special here, when multiple labels are encountered with the same name, they will only appear once!

lazy val expr = precedence[BigInt](atom)(
  Ops(InfixL)("*".label("operator") as (_ * _)),
  Ops(InfixL)("+".label("operator") as (_ + _), "-".label("operator") as (_ - _)))

Now we arrive at our final form:

expressions.parse("5x")
// res11: parsley.Result[String, BigInt] = Failure((line 1, column 2):
//   unexpected "x"
//   expected end of input, end of number, or operator
//   >5x
//     ^)

expressions.parse(" 67 + ")
// res12: parsley.Result[String, BigInt] = Failure((line 1, column 7):
//   unexpected end of input
//   expected "(" or number
//   > 67 + 
//          ^)

Great! Now obviously you could take this even further and make "(" become opening parenthesis or something, but I don't really feel that adds much.

1.2 Wrapping up the Expression Example

Hopefully, you get a sense of how much of an art form and subjective writing good error messages is, but Parsley provides decent error messages out of the box (now based on megaparsec's error messages from Haskell). It doesn't have to be hard though, so just play around and see what feels right. I would say, however, there is an interesting phenomenon in the programming languages and compilers community: compiler writers write error messages that are tailored for compiler writers. It's an interesting problem when you think about it: the person who writes error messages is a compiler expert, and so they often rely on the concepts they understand. That means they are more prone to including the names of stuff in the grammar to describe syntax problems, and so on. While this is great for experts and compiler writers, it seemingly forgets people who are new to programming or this "grammar" in particular. That can make error messages needlessly intimidating for the average Joe. The take home from this is to try and avoid labelling expr with .label("expression"), because that just ends up making something that is no longer useful or informative:

expressions.parse("")
// res13: parsley.Result[String, BigInt] = Failure((line 1, column 1):
//   unexpected end of input
//   expected expression
//   >
//    ^)

What use is that to anybody? The same idea applies to statements, and various other abstract grammatical notions. Something like "expected if statement, while loop, for loop, variable declaration, or assignment" is so much more meaningful than "expected statement". I would ask that you keep that in mind πŸ™‚. To conclude our work with this parser, here is the full code of the finished product. Obviously, with the Lexer, some of this work is already done, but you can still apply the lessons learnt here to the wider parser!

import parsley.Parsley, Parsley.{atomic, eof, many}
import parsley.errors.combinator._

object lexer {
    import parsley.character.{digit, whitespace, string, item, endOfLine}
    import parsley.combinator.manyTill

    private def symbol(str: String) = atomic(string(str)).void
    private implicit def implicitSymbol(tok: String) = symbol(tok)

    private val lineComment = "//" ~> manyTill(item, endOfLine).void.label("end of comment")
    private val multiComment = "/*" ~> manyTill(item, "*/").void.label("end of comment")
    private val comment = (lineComment | multiComment).label("comment")
    private val skipWhitespace = many(whitespace.void | comment).void.hide

    private def lexeme[A](p: =>Parsley[A]) = p <~ skipWhitespace
    private def token[A](p: =>Parsley[A]) = lexeme(atomic(p))
    def fully[A](p: =>Parsley[A]): Parsley[A] = skipWhitespace ~> p <~ eof

    val number: Parsley[BigInt] = token {
        digit.label("end of number").foldLeft1[BigInt](0)((n, d) => n * 10 + d.asDigit)
    }.label("number")

    object implicits {
        implicit def implicitSymbol(s: String): Parsley[Unit] = lexeme(symbol(s))
    }
}

object expressions {
    import parsley.expr.{precedence, Ops, InfixL}

    import lexer.implicits.implicitSymbol
    import lexer.{number, fully}

    private lazy val atom: Parsley[BigInt] = "(" ~> expr <~ ")" | number
    private lazy val expr = precedence[BigInt](atom)(
        Ops(InfixL)("*".label("operator") as (_ * _)),
        Ops(InfixL)("+".label("operator") as (_ + _), "-".label("operator") as (_ - _)))

    val parser = fully(expr)
    def parse(input: String) = parser.parse(input)
}

1.3 Using explain

So far, we've seen how label can be used to clean up error messages and make them much more presentable and informative. Another way of achieving this is by using the explain combinator. Unlike label this is much more freeform and when used properly can be incredibly effective. Essentially, with explain you are leveraging your own knowledge about the context you are in to provide a much more tailored and hand-crafted message to the user. It can be used to both provide an additional hint in an otherwise poor message or to enrich the error with suggestions for how the error might be fixed.

Using it is just as easy as using label and you can't really go wrong with it: other than being a bit... too descriptive. Again, the Lexer class already makes use of this technique to improve its own error messages, but let's suppose we wanted to write some of its functionality ourselves. Let's cook up a string literal parser, supporting some (limited) escape sequences.

import parsley.Parsley
import parsley.syntax.character.charLift
import parsley.combinator.choice
import parsley.character._
import parsley.errors.combinator._

val escapeChar =
    choice('n' as '\n', 't' as '\t', '\"', '\\')
val stringLetter =
    noneOf('\"', '\\').label("string character") |
    ('\\' ~> escapeChar).label("escape character")

val stringLiteral =
    ('\"' ~> stringOfMany(stringLetter) <~ '\"'.label("end of string")).label("string")

Let's start with something like this. If we run a couple of examples, we can see where it performs well and where it performs less well:

stringLiteral.parse("")
// res16: parsley.Result[String, String] = Failure((line 1, column 1):
//   unexpected end of input
//   expected string
//   >
//    ^)

stringLiteral.parse("\"")
// res17: parsley.Result[String, String] = Failure((line 1, column 2):
//   unexpected end of input
//   expected end of string, escape character, or string character
//   >"
//     ^)

stringLiteral.parse("\"\\a")
// res18: parsley.Result[String, String] = Failure((line 1, column 3):
//   unexpected "a"
//   expected """, "\", "n", or "t"
//   >"\a
//      ^)

So, for the first two cases, the error message performs quite well. But the last message is a bit noisy. One possible approach to improve this could be to label each alternative to give them a slightly clearer name, which would result in something like:

stringLiteral.parse("\"\\a")
// res19: parsley.Result[String, String] = Failure((line 1, column 3):
//   unexpected "a"
//   expected \", \\, \n, or \t
//   >"\a
//      ^)

This is better, but a bit misleading, we don't expect a \! Now, you could instead opt to remove the backslashes, but then that doesn't give much information about why these things are expected. Another option would be to label all alternatives with some common name:

val escapeChar =
    choice('n' as '\n', 't' as '\t', '\"', '\\')
        .label("end of escape sequence")

Which would yield

stringLiteral.parse("\"\\a")
// res20: parsley.Result[String, String] = Failure((line 1, column 3):
//   unexpected "a"
//   expected end of escape sequence
//   >"\a
//      ^)

This is a bit more helpful, in that it does provide a good name to what we expected. But at the same time it doesn't help the user to understand how to fix their problem: "what is an escape sequence". This is similar to the "statement" problem I described above. In this case, (and indeed in the "statement" case), we can add an explain to help the user understand what we mean:

val escapeChar =
    choice('n' as '\n', 't' as '\t', '\"', '\\')
        .label("end of escape sequence")
        .explain("valid escape sequences include \\n, \\t, \\\", or \\\\")

The explain combinator annotates failure messages with an additional reason. These can stack, and are displayed each on their own line in the error message. With this in place, let's see what the new error message is:

stringLiteral.parse("\"\\a")
// res21: parsley.Result[String, String] = Failure((line 1, column 3):
//   unexpected "a"
//   expected end of escape sequence
//   valid escape sequences include \n, \t, \", or \\
//   >"\a
//      ^)

This time, we keep the name of the expected token clear and concise, but we also help the user to understand what this actually means. The error isn't misleading in the sense that we aren't suggesting that a \n would fix the parse error after the \ we already wrote, but have have said that we expect the end of the escape as well as demonstrated what that would look like. This is great!

There isn't much more to say about the explain combinator than that really. Hopefully this already gives you a sense of how useful it can be. Like I mentioned before, the poor error problem that compiler writers often suffer from can be nicely solved using explain. For instance, a message like "... expected statement ... valid statements include 'if statements', 'loops', or 'assignments'" is subjectively better than both of the alternatives (namely "expected statement" or the one that lists out every single alternative). This has the benefits of both worlds: for an experienced user, the error message gets straight to the point, and for the newcomer, the error message provides a bit more information that can help them learn the terminology.

2 Adjusting Error Formatting

As we've seen in this post, the error messages produced by parsley are fairly readable. They are broken into two kinds: "vanilla" errors built up of "expected", "unexpected", and "reason" clauses; and "specialised" errors built up solely of "message" clauses. So far, we have only seen examples of the "vanilla" errors, and we will see the "specialised" errors in the next post. These have been so far formatted using Parsley's default mechanism, which creates an error as a string. This is ok for basic use, but in projects where there is some pre-existing error format, then maintaining consistency across error messages is much harder without parsing the resulting String errors to extract their content: this is, frankly, ridiculous to expect! Moreover, suppose you wanted to unit test your parser in both successful and failing cases, then performing raw string comparision is really brittle, especially if Parsley adjusts the format slightly!

Luckily, Parsley 3.0.0 introduced an abstraction layer between the error messages that the parsers work with and the final resulting error message. This means that actually, the error message format is not only configurable, but doesn't have to be a String! The final part of this post is dedicated to understanding how to work with this mechanism, using Parsley's own unit test formatter as an example.

Firstly, I want to give examples of both types of format, and annotate the names given to each part of them:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Vanilla Error                                                       β”‚
β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β—„β”€β”€β”€β”€β”€β”€β”€β”€ position         β”‚
β”‚                  source  β”‚                β”‚                           β”‚
β”‚                     β”‚    β”‚   line      colβ”‚                           β”‚
β”‚                     β–Ό    β”‚     β”‚         β”‚β”‚                           β”‚
β”‚                  β”Œβ”€β”€β”€β”€β”€β” β”‚     β–Ό         β–Όβ”‚   end of input            β”‚
β”‚               In foo.txt (line 1, column 5):       β”‚                  β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚                  β”‚
β”‚unexpected ─────►│                     β”‚            β”‚  β”Œβ”€β”€β”€β”€β”€ expected β”‚
β”‚                 β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚               β”‚
β”‚                 unexpected end of input               β–Ό               β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚                 expected "(", "negate", digit, or letter              β”‚
β”‚                          β”‚    β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”˜ ◄────── namedβ”‚
β”‚                          β”‚       β–²        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚              β”‚
β”‚                          β”‚       β”‚                     β”‚              β”‚
β”‚                          β”‚      raw                    β”‚              β”‚
β”‚                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                 '-' is a binary operator   β”‚                          β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚                          β”‚
β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”        β–²           β”‚                          β”‚
β”‚                β”‚>3+4- β”‚        β”‚           expected items             β”‚
β”‚                β”‚     ^β”‚        β”‚                                      β”‚
β”‚                β””β”€β”€β”€β”€β”€β”€β”˜        └───────────────── reason              β”‚
β”‚                   β–²                                                   β”‚
β”‚                   β”‚                                                   β”‚
β”‚                   line info                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Specialised Error                                                   β”‚
β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β—„β”€β”€β”€β”€β”€β”€β”€β”€ position         β”‚
β”‚                  source  β”‚                β”‚                           β”‚
β”‚                     β”‚    β”‚   line       col                           β”‚
β”‚                     β–Ό    β”‚     β”‚         β”‚                            β”‚
β”‚                  β”Œβ”€β”€β”€β”€β”€β” β”‚     β–Ό         β–Ό                            β”‚
β”‚               In foo.txt (line 1, column 5):                          β”‚
β”‚                                                                       β”‚
β”‚           β”Œβ”€β”€β”€β–Ί something went wrong                                  β”‚
β”‚           β”‚                                                           β”‚
β”‚ message ──┼───► it looks like a binary operator has no argument       β”‚
β”‚           β”‚                                                           β”‚
β”‚           └───► '-' is a binary operator                              β”‚
β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”                                               β”‚
β”‚                β”‚>3+4- β”‚                                               β”‚
β”‚                β”‚     ^β”‚                                               β”‚
β”‚                β””β”€β”€β”€β”€β”€β”€β”˜                                               β”‚
β”‚                   β–²                                                   β”‚
β”‚                   β”‚                                                   β”‚
β”‚                   line info                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

As you can see, the content for a specialised error is (ironically) plainer than a vanilla message. This means that the errors are much more customisable from the parser side, but it is less rich in parser generated information than the vanilla is. Hopefully you can see that both error messages still have a very similar shape other than the error info lines themselves. In both cases, and not shown by the diagrams, the main contents of the error -- either unexpected, expected, reasons, and line info; or messages and line info -- are called "error info lines".

For vanilla errors, notice that the unexpected and expected lines make references to raw, named, and end of input: these are collectively known as items. The .label combinator produces named items, the eof combinator produces the end of input item, and unlabelled combinators produce raw items.

Together, all these components are referenced (by these names!) by the ErrorBuilder trait. The way it works is that a concrete ErrorBuilder has to be provided to the .parse method of a parser, and when the parser has finished failing, the builder is used to construct the final error message, converting the internal representation that Parsley uses into the desired output specified by the builder: you can think of it like a conversation. The internals of Parsley take a portion of the information it has, and talks to the builder how to format it into another intermediate form; it then will feed this new information into another method of the builder after possibly more collection. To allow all of this plumbing to be fed together and maintain maximum flexiblity to the user, the builder makes use of "associated types". Let's take a look at the definition of ErrorBuilder without all the sub-formatters to understand what I mean:

trait ErrorBuilder[Err] {
    // This is the top level function which takes all the sub-parts
    // and combines them into the final `Err`
    def format(pos: Position, source: Source, lines: ErrorInfoLines): Err

    type Position
    type Source
    type ErrorInfoLines
    type ExpectedItems
    type Messages
    type UnexpectedLine
    type ExpectedLine
    type Message
    type LineInfo
    type Item
    type Raw <: Item
    type Named <: Item
    type EndOfInput <: Item

    ...
}

Wow, that's a lot of types! Essentially, each concrete implementation of this trait must specify what each of those types are. This means that the representation of the error is as flexible as possible. In the format method, you can see that the types Position, Source, and ErrorInfoLines are all referenced. Indeed, you can also see these marked on both diagrams: in other words, format is responsible for the general shape of both types of error message.

To understand how these might come about, let's take a step "into" the formatter to find the sources of values for Position, Source, and ErrorInfoLines:

trait ErrorBuilder[Err] {
    ...

    def pos(line: Int, col: Int): Position
    def source(sourceName: Option[String]): Source

    def vanillaError(unexpected: UnexpectedLine, expected: ExpectedLine,
                     reasons: Messages, line: LineInfo): ErrorInfoLines
    def specialisedError(msgs: Messages, line: LineInfo): ErrorInfoLines

    ...
}

Hopefully, you can start to see how this might be structured:

I won't continue traversing deeper and deeper into the system, because it's just going to be the same idea over and over again. But I will note all the "terminal" methods that do take information directly from the parser:

trait ErrorBuilder[Err] {
    ...

    def pos(line: Int, col: Int): Position
    def source(sourceName: Option[String]): Source

    def reason(reason: String): Message
    def message(msg: String): Message
    def lineInfo(line: String, linesBefore: Seq[String],
                 linesAfter: Seq[String], errorPointsAt: Int): LineInfo
    val numLinesBefore: Int
    val numLinesAfter: Int

    def raw(item: String): Raw
    def named(item: String): Named
    val endOfInput: EndOfInput

    def unexpectedToken(cs: Iterable[Char], amountOfInputParserWanted: Int,
                        lexicalError: Boolean): Token
}

The two attributes numLinesBefore and numLinesAfter are used by the Parsley internals to decide how many raw lines of input both before and after the problematic line to provide to lineInfo. In a pinch, overriding these values from DefaultErrorBuilder is a quick way of changing how specific your errors are to other lines in the input. The unexpectedToken method is special, but I'll leave a discussion of this another page. All of the other methods in the ErrorBuilder will make use of the refined results from the methods above.

I hope that, by this point, you have a reasonable idea of how this system all ties together. But, if you don't, or you want an example, take a look at how parsley's own unit tests format error messages to be easier to pattern match on and test against, the implementation can be found here.