Precedence Combinators

The chain combinators in parsley.expr.{chain, infix} are useful for handling isolated instances of expression-like behaviours in a grammar. However, they can be cumbersome to use when there are multiple interacting layers of expression operators. In these circumstances, a precedence table is the more appropriate choice. In parsley, the precedence machinery is very general, which can make the documentation intimidating. This page aims to break down how this works, with examples of different uses. The relevant combinator for turning a precedence table into a parser is the precedence combinator.

Formulating Precedence

In general, a precedence table is formed up of a collection of base "atoms", and then one or more levels of operators. The precise shape these components take depends on how complex the types of the results are.

The simplest shape of precedence directly takes a variadic list of atoms and then a variadic list of levels as arguments: the levels lexically closest to the atoms will be the tightest-binding operators. This shape is discussed in Homogeneous Precedence, below.

More generally, however, the precedence combinator takes a heterogeneous list of levels of operators, where the base case of this structure contains the atoms. This can be used to handle more generic shapes of results, and is discussed in Heterogeneous Precedence.

However, both shapes make use of the Ops type, which requires a Fixity to be provided to it. First, it is good to understand the basics of how that works in a simplified presentation: for now, trust that the types work out. There are five different Fixity values: InfixL, InfixR, Prefix, Postfix, and InfixN. These each denote either binary or unary operators, and where recursion may appear for each of them. InfixN is a non-recursive operator, like < in some languages. At its simplest, Ops first takes a Fixity and then takes a variadic number of operators that should have that fixity and all have the same precedence. This will be explored in more detail in Ops and Fixity.

Throughout this section, the following imports and parsers are assumed:

import parsley.Parsley
import parsley.character.{letter, digit, stringOfSome}
import parsley.syntax.character.stringLift

val int: Parsley[Int] = digit.foldLeft1(0)((n, d) => n * 10 + d.asDigit)
val ident: Parsley[String] = stringOfSome(letter)

Homogeneous Precedence

When the result type of the precedence table is the same throughout the table -- right through to the atoms, this is a homogeneous precedence table. This can be captured by the Ops smart-constructor, and make use of the simple flat structure. As an example:

import parsley.expr.{precedence, Ops, InfixL}

sealed trait Expr
case class Add(x: Expr, y: Expr) extends Expr
case class Sub(x: Expr, y: Expr) extends Expr
case class Mul(x: Expr, y: Expr) extends Expr
case class Num(n: Int) extends Expr
case class Var(v: String) extends Expr

val expr: Parsley[Expr] =
    precedence(ident.map(Var), int.map(Num))(
        Ops(InfixL)("*" as Mul),
        Ops(InfixL)("+" as Add, "-" as Sub)
    )
// expr: Parsley[Expr] = parsley.Parsley@1f2ce0d8

expr.parse("x+5*y")
// res0: parsley.Result[String, Expr] = Success(Add(Var(x),Mul(Num(5),Var(y))))

In the above example, notice that Add and Sub are within the same Ops, and therefore are the same precedence; Mul is closer to the atoms lexically than Add and Sub are, so it binds tighter. All levels have been marked as left associative with InfixL. When using Prefix or Postfix, the type of the parsers in the Ops will have type Parsley[A => A] instead of the above Parsley[(A, A) => A].

Ops and Fixity

The above example of precedence did not elaborate on how the types work to allow the operators to vary in type when the fixity does. Here is the type of the Ops constructor from above:

object Ops {
    def apply[A](fixity: Fixity)(ops: Parsley[fixity.Op[A, A]]*): Ops[A, A]
}

There are a few things to note here. First is that the type returned is Ops[A, A]: this means that the input type to this precedence layer matches the output type (this is because it's homogeneous). Second is that the type that the operators must return is fixity.Op[A, A]: this is known as a path-dependent type. Each object of type Fixity will describe the shape of operators that match that fixity, the two type parameters are the input and output of the operator (again, for homogeneous precedence, these will coincide). This is formulated as follows:

sealed trait Fixity {
    type Op[A, B]
}
case object InfixL  { type Op[A, B] = (B, A) => B }
case object InfixR  { type Op[A, B] = (A, B) => B }
case object InfixN  { type Op[A, B] = (A, A) => B }
case object Prefix  { type Op[A, B] = B => B }
case object Postfix { type Op[A, B] = B => B }

These types align with those seen in parsley.expr.infix, as well as Prefix and Postfix matching the homogeneous shape in parsley.expr.chain.{prefix, postfix}. As such, when a specific fixity is provided to Ops, it also fixes the type of the operators in the next set of parentheses.

When considering heterogeneous combinators, the types A and B can indeed vary, and the more general SOps and GOps will be explored in the next section.

Heterogeneous Precedence

When the results of the precedence table is not the same throughout, it is necessary to generalise the machinery to make use of either SOps or GOps:

object SOps {
    def apply[B, A <: B](fixity: Fixity)(ops: Parsley[fixity.Op[A, B]]*): Ops[A, B]
}

object GOps {
    def apply[A, B](fixity: Fixity)(ops: Parsley[fixity.Op[A, B]]*)
                   (implicit wrap: A => B): Ops[A, B]
}

The SOps object allows the input and output to the layer to vary so long as they are in a sub-type relation: this is the most common form of heterogeneous hierachy that leverages Scala's strengths. Otherwise, GOps handles any arbitrary relationship between the types, so long as there is a known implicit A => B conversion.

Since the types between layers differ, a variadic argument list cannot be used to collect them together. Instead, the Ops are stitched together into a Prec[A] structure:

sealed trait Prec[A] {
    def +:[B](ops: Ops[A, B]): Prec[B]
    def :+[B](ops: Ops[A, B]): Prec[B]
}
case class Atoms[A](atoms: Parsley[A]*) extends Prec[A]

The atoms are placed into Atoms, which is the base case of the list. The +: and :+ operators attach an additional layer onto the table, adapting the existing tables input type into a new output type via the given operators. Again, levels closer to the Atoms will bind tighter. As an example:

import parsley.expr.{precedence, SOps, Atoms, InfixL}

sealed trait Expr
case class Add(x: Expr, y: Term) extends Expr
case class Sub(x: Expr, y: Term) extends Expr
sealed trait Term extends Expr
case class Mul(x: Term, y: Atom) extends Term
sealed trait Atom extends Term
case class Num(n: Int) extends Atom
case class Var(v: String) extends Atom

val expr: Parsley[Expr] =
    precedence {
        Atoms(ident.map(Var), int.map(Num)) :+
        SOps(InfixL)("*" as Mul) :+
        SOps(InfixL)("+"as Add, "-" as Sub)
    }
// expr: Parsley[Expr] = parsley.Parsley@7d44af0f

expr.parse("x+5*y")
// res1: parsley.Result[String, Expr] = Success(Add(Var(x),Mul(Num(5),Var(y))))

Here, the types within the syntax tree are very specific about the fact that all three operators are left associative. To be able to use this type with precedence SOps is required, since the outer layer takes in Terms and produces Exprs, the middle layer takes Atoms and produces Terms, and the inner-most atoms produce Atoms. Notice that, if the layers are incorrectly stitched together, the table does not typecheck:

import parsley.expr.InfixR

val expr: Parsley[Expr] =
    precedence {
        Atoms(ident.map(Var), int.map(Num)) :+
        SOps(InfixR)("*" as Mul) :+
        SOps(InfixL)("+" as Add, "-" as Sub)
    }
// error: type mismatch;
//  found   : parsley.Parsley[Mul.type]
//  required: parsley.Parsley[parsley.expr.InfixR.Op[Term,Term]]
//     (which expands to)  parsley.Parsley[(Term, Term) => Term]
//         SOps(InfixR)("*" as Mul) :+
//                      ^^^^^^^^^^
val expr: Parsley[Expr] =
    precedence {
        Atoms(ident.map(Var), int.map(Num)) :+
        SOps(InfixL)("+" as Add, "-" as Sub) :+
        SOps(InfixL)("*" as Mul)
    }
// error: inferred type arguments [Atom,Term] do not conform to method :+'s type parameter bounds [Aʹ >: Expr,B]
//         SOps(InfixL)("+" as Add, "-" as Sub) :+
//                                              ^^
// error: type mismatch;
//  found   : parsley.expr.Ops[Atom,Term]
//  required: parsley.expr.Ops[Aʹ,B]
//         SOps(InfixL)("*" as Mul)
//         ^^^^^^^^^^^^