Generic Bridges (parsley.generic
)
The Parser Bridge pattern is a technique for decoupling semantic actions from the parser itself.
The parsley.generic
module contains 23 classes that allow
you to get started using the technique straight away if you wish.
The Scaladoc for this page can be found at parsley.generic
.
What are Parser Bridges?
Without making use of Parser Bridges, results of parsers are usually combined by using lift
,
map
, or zipped
:
import parsley.syntax.zipped.zippedSyntax2
case class Foo(x: Int, y: Int)
// with px, py of type Parsley[Int]
val p = (px, py).zipped(Foo(_, _))
// p: Parsley[Foo] = parsley.Parsley@3f64eb73
These work fine for the most part, however, there are couple of problems with this:
- In Scala 3,
Foo(_, _)
actually needs to be written asFoo.apply
, which introduces some (minor) noise;zipped
itself is even contributing noise. -
Foo
itself is a simple constructor, if it gets more complex, readability rapidly decreases:- The result produced may require inspection of the data, including pattern matching (see Normalising or Disambiguating Data).
- Additional information may need to be threaded in, like position information.
- Data invariances may need to be enforced (see Enforcing Invariances).
- Constructor application is on the right of the data, which people may find harder to read;
this can be mitigated with
lift
, but that may run into type inference issues.
For some people (1) or (3) may not be an issue, or can be tolerated, but (2) can get out of hand quickly. For larger parsers, properly decoupling these issues can make a huge difference to the maintainability.
How do bridges help?
In short, a bridge is an object that provides an apply
method that takes parsers as arguments,
as opposed to values. This means that they can be used directly in the parser with the logic kept
elsewhere. While you can just define a bridge manually with an apply
method (or even just as a
function), it is more ergonomic to synthesise a bridge in terms of a function that does not interact
with Parsley
values. If we assume that the companion object of Foo
has been turned into such a
bridge (definition below), the above example can be written as:
val q = Foo(px, py)
// q: Parsley[Foo] = parsley.Parsley@3f7ec915
In this version, the act of constructing the Foo
value has been abstracted behind the bridge, Foo
:
this means that the underlying implementation can vary without changing the parser.
What are Generic Bridges?
Generic bridges are the templating mechanism that allow for the synthesis of an apply
method that
works on values of type Parsley
from another that does not. While you can define your own bridge
templates (see the associated tutorial for an explanation),
parsley
provides some basic ones to get you started.
How to use
The parsley.generic
module contains ParserBridge1
through
ParserBridge22
as well as ParserBridge0
; they all extend ParserBridgeSingleton
, which provides
some additional combinators as well as ErrorBridge
, which allows labels and a reason to be attached
to a bridge.
ParserBridge1[-T1, +R]
through ParserBridge22[-T1, .., -T22, +R]
Each of these traits are designed to be implemented ideally by a companion object for a case class
.
For example, the Foo
class above can have its companion object turned into a bridge by extending
ParserBridge2
(which is for two argument bridges):
import parsley.generic.ParserBridge2
object Foo extends ParserBridge2[Int, Int, Foo]
This defines def apply(px: Parsley[Int], py: Parsley[Int]): Parsley[Foo]
, implementing it in
terms of def apply(x: Int, y: Int): Foo
, which is included as part of Scala's automatic case class
implementation. By making use of a companion object, this is all the boilerplate required to
start using the bridge. Of course, it's possible to define standalone bridges as well, so long as
you provide an implementation of apply
, as illustrated by this error:
object Add extends ParserBridge2[Int, Int, Int]
// error: object creation impossible.
// Missing implementation for member of trait ParserBridge2:
// def apply(x1: Int, x2: Int): Int = ??? // implements `def apply(x1: T1, x2: T2): R`
//
// object Add extends ParserBridge2[Int, Int, Int]
// ^^^
Implement that apply
method and it's good to go! Of course, if the traits are mixed into a regular class
,
they can also be parametric:
class Cons[A] extends ParserBridge2[A, List[A], List[A]]
// error: class Cons needs to be abstract.
// Missing implementation for member of trait ParserBridge2:
// def apply(x1: A, x2: List[A]): List[A] = ??? // implements `def apply(x1: T1, x2: T2): R`
//
// class Cons[A] extends ParserBridge2[A, List[A], List[A]]
// ^^^^
ParserSingletonBridge[+T]
All the generic bridges extend the ParserSingletonBridge
trait instantiated to a function type. For
example, trait ParserBridge2[-A, -B, +C] extends ParserSingletonBridge[(A, B) => C]
. This means that
every bridge uniformly gets access to a couple of extra combinators in addition to their apply
:
trait ParserSingletonBridge[+T] {
final def from(op: Parsley[_]): Parsley[T]
final def <#(op: Parsley[_]): Parsley[T] = this.from(op)
}
The implementation of from
is not important, it will be handled by the other ParserBridgeN
s. What these
two combinators give you is the ability to write Foo.from(parser): Parsley[(Int, Int) => Foo]
, for instance.
This can be useful when you want to use a bridge somewhere where the arguments cannot be directly applied,
like in chain or precedence combinators:
import parsley.expr.chain
import parsley.syntax.character.stringLift
val term = chain.left1(px)(Add.from("+")) // or `Add <# "+"`
They are analogous to the as
and #>
combinators respectively.
ParserBridge0[+T]
This trait is a special case for objects that should return themselves. As an example, here is an object which forms part of a larger AST, say:
import parsley.generic.ParserBridge0
trait Expr
// rest of AST
case object NullLit extends Expr with ParserBridge0[Expr]
The NullLit
object is part of the Expr
AST, and it has also mixed in ParserBridge0[Expr]
,
giving it access to from
and <#
only (no apply
for this one!). What this means is that
you can now write the following:
val nullLit = NullLit <# "null"
// nullLit: Parsley[Expr] = parsley.Parsley@2608348e
nullLit.parse("null")
// res2: parsley.Result[String, Expr] = Success(NullLit)
Without any further configuration, notice that the result of parsing "null"
is indeed NullLit
,
and nullLit: Parsley[Expr]
.
Be aware that the type passed to the generic parameter cannot be itself:
case object Bad extends ParserBridge0[Bad.type]
// error: illegal cyclic reference involving object Bad
// case object Bad extends ParserBridge0[Bad.type]
// ^^^
Resolving this will require introducing an extra type, like Expr
in the example with
NullLit
, which breaks the cycle sufficiently.
ErrorBridge
The ParserSingletonBridge
extends the ErrorBridge
trait, which is shaped as follows:
trait ErrorBridge {
def labels: List[String] = Nil
def reason: Option[String] = None
// applies the above labels and reason if applicable
final protected def error[T](p: Parsley[T]): Parsley[T]
}
This allows all derived bridges to specify labels to refer to the parser if it fails, as well as
a reason why it might be needed. This is useful because, anecdotally, I've found that ~70% of
instances of the label
combinator occur attached directly to a bridge application; and this
therefore allows for error annotation to kept away from the main parser description.
Additional Use Cases
Other than the natural decoupling that the bridges provide, there are some more specialised uses that can come out of the generic bridges alone.
Normalising or Disambiguating Data
Occasionally, the shape of an AST can change internally even though the syntax of the
language being parsed does not. Bridges are perfectly suited for handling these internal
changes while masking them from the parser itself. As an example, assume that a Let
AST node was previously defined as follows:
case class Let(bindings: List[Binding], body: Expr)
object Let extends ParserBridge2[List[Binding], Expr, Let]
The parser, therefore, can be expected to produce lists of bindings to feed in. However,
later it was decided that the ordering of the bindings doesn't matter, so a Set
is being
used. The decoupling of the bridge will allow for this change to happen without changing the
parser, so long as the bridge performs the "patching":
case class Let(bindings: Set[Binding], body: Expr)
object Let extends ParserBridge2[List[Binding], Expr, Let] {
def apply(bindings: List[Binding], body: Expr): Let = Let(bindings.toSet, body)
}
By defining the appropriate "forwarding" in its own apply
, the bridge has ensured the
parser will still work.
Handling ambiguity Another use of this kind of bridge is to allow for the disambiguation of two syntactically similar structures. As an example, consider Scala's tuple syntax:
val x = (6)
val xy = (5, 6)
It is clear to us that x
is not a "singleton pair", whatever that would be, but a parenthesised
expression; on the other hand, xy
is clearly a pair. The problem is that the syntax for these
overlap, requiring backtracking to resolve (you can only know if you're parsing a tuple when
you find your first ','
).
In practice, arbitrary backtracking in a parser can impact performance and the quality of error messages. Instead of dealing with this ambiguity by backtracking, it is possible to exploit the shared structure in a Disambiguator Bridge: this is just a bridge that looks at the provided arguments to decide what to make. For example:
import cats.data.NonEmptyList
case class Tuple(exprs: NonEmptyList[Expr]) extends Expr
object TupleOrParens extends ParserBridge1[NonEmptyList[Expr], Expr] {
def apply(exprs: NonEmptyList[Expr]): Expr = exprs match {
case NonEmptyList(expr, Nil) => expr
case exprs => Tuple(exprs)
}
}
In the above example, the parser will parse one or more expressions (signified by the NonEmptyList
from cats
); the bridge will then inspect these expressions, returning a single expression if
only one was parsed, and construct the Tuple
node otherwise. In the parser, this would just look
something like:
// from `parsley-cats`, produces `NonEmptyList` instead of `List`
import parsley.cats.combinator.sepBy1
val tupleOrParens = TupleOrParens("(" ~> sepBy1(nullLit, ",") <~ ")")
// tupleOrParens: Parsley[Expr] = parsley.Parsley@6b2e7db4
tupleOrParens.parse("(null)")
// res4: parsley.Result[String, Expr] = Success(NullLit)
tupleOrParens.parse("(null,null)")
// res5: parsley.Result[String, Expr] = Success(Tuple(NonEmptyList(NullLit, NullLit)))
Enforcing Invariances
So far, the bridges we've seen have been altering the way that the data itself should
be constructed from the results. However, it may also be desirable to override the
templated apply
to perform additional checks. This basically means that you
can add in a filter
-like combinator after the data has been constructed to validate
that the thing you've constructed is actually correct. As an example, it turns out that
Scala 2 only allows tuples with a maximum of 22 elements:
val oops = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3)
// error: tuples may not have more than 22 elements, but 23 given
// val oops = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3)
// ^
How to adjust the parser to handle this? One possible approach is to use the range
combinator:
def nonEmptyList[A](px: Parsley[A], pxs: Parsley[List[A]]) =
lift2(NonEmptyList(_, _), px, pxs)
val tupleOrParensObtuse =
TupleOrParens("(" ~> nonEmptyList(nullLit, range(0, 21)("," ~> nullLit)) <~ ")")
This works, but it's very obtuse. Not to mention that the error message generated isn't particularly
good. Instead, we can hook some extra behaviour into the generated apply
:
import parsley.errors.combinator._
object TupleOrParens extends ParserBridge1[NonEmptyList[Expr], Expr] {
def apply(exprs: NonEmptyList[Expr]): Expr = exprs match {
case NonEmptyList(expr, Nil) => expr
case exprs => Tuple(exprs)
}
override def apply(exprs: Parsley[NonEmptyList[Expr]]): Parsley[Expr] =
super.apply(exprs).guardAgainst {
case Tuple(exprs) if exprs.size > 22 =>
Seq(s"tuples may not have more than 22 elements, but ${exprs.size} given")
}
}
This bridge invokes the templated apply
with super.apply
first, then after processes it
with the guardAgainst
combinator to generate a bespoke message:
tupleOrParens.parse("(null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null)")
// res7: parsley.Result[String, Expr] = Failure((line 1, column 1):
// tuples may not have more than 22 elements, but 23 given
// >(null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null)
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^)
To be clear, this is using the original definition of tupleOrParens
and not tupleOrParensObtuse
.
While this use of bridges does retain the unsaturated application from ParserSingletonBridge
, using the
from
combinator will not perform the additional validation: be careful!
When not to use
Simply put, generic bridges have one major limitation: they cannot interact with additional metadata
that might be required in the parser. One excellent example of this is position information. While
parsley
could take a stance on how this should be done, I'd prefer if the users can make that
decision for themselves. The previously linked tutorial demonstrates how to make templating bridges
from scratch, which you would need to do to support something like position tracking.