Parser Combinator Cheatsheet

Until you're more comfortable using parser combinators and have a sense for how to piece things together, the sheer choice can be daunting. This page is designed to help bridge this gap by bringing attention to some of the more common and useful combinators, as well as a few common idioms for getting stuff done.

Quick Documentation

This section is designed to give a quick reference for some of the most common combinators, their types as well as their use.

Basic Combinators

These are the basic combinators for combining smaller parsers into bigger parsers, or adjusting their results. These combinators are sometimes lazy in their arguments, which is denoted here by the regular by-name (=>A) syntax. If an argument is strict, it means that it will be parsed immediately on entry to the combinator before any input can be consumed.

Combinator Type Use Pronounciation
pure(_) A => Parsley[A] return a value of type A without parsing anything. "pure"
_ *> _
_ ~> _
( Parsley[A]
, =>Parsley[B]
) => Parsley[B]
sequence two parsers, returning the result of the second. "then"
_ <* _
_ <~ _
( Parsley[A]
, =>Parsley[B]
) => Parsley[A]
sequence two parsers, returning the result of the first. "then discard"
_.map(_) ( Parsley[A]
, A => B
) => Parsley[B]
use a function to change the result of a parser. "map"
_ <#> _ ( A => B
, Parsley[A]
) => Parsley[B]
use a function to change the result of a parser.

(Requires import parsley.extension.HaskellStyleMap)
"map"
_ #> _
_.as(_)
( Parsley[A]
, B
) => Parsley[B]
replace the result of a parser with a fixed value. "as"
liftN(_, .., _) ( (A1, A2, .., An) => B
, Parsley[A1]
, =>Parsley[A2]
, ..
, =>Parsley[An]
) => Parsley[B]
use a function to combine the results of n parsers, sequencing them all together. "lift n"
_ <|> _
_ | _
( Parsley[A]
, =>Parsley[A]
) => Parsley[A]
try one parser, and if it fails without consuming input try the second "or"
atomic(_) Parsley[A] => Parsley[A] perform a parser, but roll-back any consumed input if it fails, use in conjunction with <|>. "atomic"
lookAhead(_) Parsley[A] => Parsley[A] execute a parser, and roll-back any consumed input if it succeeded. "look-ahead"
notFollowedBy(_) Parsley[A] => Parsley[Unit] execute a parser, never consuming input: succeed only if the parser fails. "not followed by"
empty Parsley[Nothing] fails when executed. "empty"

Character Combinators

These combinators, found in parsley.character are useful for dealing with actually consuming input.

Combinator Type Use
char(_) Char => Parsley[Char] Reading a single specific character. That character is returned.
string(_) String => Parsley[String] Reading a single specific string. That string is returned.
satisfy(_) (Char => Boolean) => Parsley[Char] Read any single character for which the provided function returns true. The character returned is the one read.
oneOf(_*) Char* => Parsley[Char] Read any one of the provided characters (which are varargs). The character returned is the one read.
noneOf(_*) Char* => Parsley[Char] Read any single character that is not one of the provided characters. The character returned is the one read.

Lifty Combinators

These combinators can all be implemented in terms of lift2 (see liftN above), but are considered useful enough to have their own syntax and name. These combinators are lazy in their arguments (but not receivers),denoted here by the regular by-name (=>A) syntax. As such, the receiver is parsed first and may consume input, which means the argument may contain a recursive position (and must therefore be lazy).

Combinator Type Use Pronounciation
_ <~> _ (Parsley[A], =>Parsley[B]) => Parsley[(A, B)] combine the results using (_, _) "zip"
_ <*> _ (Parsley[A => B], =>Parsley[A]) => Parsley[B] combine the results using (f, x) => f(x). "ap"
_ <**> _ (Parsley[A], =>Parsley[A => B]) => Parsley[B] combine the results using (x, f) => f(x). "reverse ap"
_ <::> _ (Parsley[A], =>Parsley[List[A]]) => Parsley[List[A]] combine the results using _ :: _. "cons"

Composite Combinators

These combinators tackle more common complex tasks. In particular many and some are very important. They are all found in parsley.combinator. These combinators are sometimes lazy in their arguments, which is denoted here by the regular by-name (=>A) syntax. Care should be taken with the combinators with variadic arguments, as they are totally strict, even in normally lazy positions: LazyParsley.unary_~ can be used to restore laziness in these positions.

Combinator Type Use
many(_) Parsley[A] => Parsley[List[A]] run one parser many times until it fails, collecting all the results in a list.
some(_) Parsley[A] => Parsley[List[A]] as above, but the parser must succeed at least once.
eof Parsley[Unit] check if there is any remaining input: it will succeed if there is none.
choice(_*) Parsley[A]* => Parsley[A] try each of the given parsers in turn until one succeeds: uses <|>.
option(_) Parsley[A] => Parsley[Option[A]] try a parser, if it succeeds wrap the result in Some, and if it fails without consuming input return None.
optional(_) Parsley[A] => Parsley[Unit] optionally parse something (but if it fails, it must not consume input).
sepBy1(_, _) (Parsley[A], =>Parsley[_]) => Parsley[List[A]] parse one thing separated by another, collecting all the results. Something like comma-separated arguments in a function call.
endBy1(_, _) (Parsley[A], =>Parsley[_]) => Parsley[List[A]] same as above, but the sequence must be ended by the separator again. Something like semi-colon separated statements in C-like languages.
sepEndBy1(_, _) (Parsley[A], =>Parsley[_]) => Parsley[List[A]] same as above, but the terminal separator is optional. Something like semi-colon separated statements in Scala.

Building Values and ASTs

This section covers the common ways you might build a result value or Abstract Syntax Tree (AST) with your parsers.

The most primitive combinators for reading input all have a tendency to return the thing they parsed, be it a single character or a string. For the most part, this is not the useful output you'd like your parser to have.

Transforming a single value with map

The quickest way to change the result of a parser is by using .map or the #> combinator (see the above quick documentation). This is really useful for changing the result of a single parser, but provides no way of combining multiple.

import parsley.Parsley, Parsley.some
import parsley.character.digit

case class Num(n: Int)

// A preferred method is to use `digit.foldLeft1` to avoid creating a List.
val digits: Parsley[List[Char]] = some(digit)
// `map` here is using a function of type `List[Char] => Int`
val int: Parsley[Int] = digits.map(_.mkString.toInt) // equivalently `digits.map(_.mkString).map(_.toInt)
// `map` here is being used to wrap the `Int` in the `Num` class
val num: Parsley[Num] = int.map(Num)

But when you need to combine the results of two parsers more options open up.

Combining multiple results with lift, <::>, and friends

Let's suppose we want to rule out leading zeros in the above parser. We'll need to read one non-zero digit before we read zero or more digits. In this case, we want the first digit to be added to the list of remaining digits. This task is quite common, so the <::> combinator is designed specially for it:

import parsley.Parsley, Parsley.many
import parsley.character.{digit, oneOf, char}

case class Num(n: Int)

val nonzero = oneOf('1' to '9')

// <::> adds the leading non-zero char onto the other digits
val digits: Parsley[List[Char]] = nonzero <::> many(digit)
// Using #> here to handle the plain ol' zero case
val int: Parsley[Int] = char('0') #> 0 | digits.map(_.mkString.toInt)
val num: Parsley[Num] = int.map(Num)

But more generally, we could reach for the lift functions:

import parsley.Parsley, Parsley.many
import parsley.character.{digit, oneOf, char}
import parsley.lift.lift2

case class Num(n: Int)

val nonzero = oneOf('1' to '9')

val digits: Parsley[List[Char]] = lift2[Char, List[Char], List[Char]](_ :: _, nonzero, many(digit))
// Using #> here to handle the plain ol' zero case
val int: Parsley[Int] = char('0') #> 0 | digits.map(_.mkString.toInt)
val num: Parsley[Num] = int.map(Num)

Sadly, to do this, it's sometimes necessary to specify all the types, in particular for anonymous functions that can have many possible type-instantiations, like _ :: _. The reason is that Scala doesn't infer the types of arguments, only return values, so on its own _ :: _ has no known type. As such, the fix is to let other type-instantations help give the argument types (as above) or to specify the types in the function manually:

lift2((c: Char, cs: List[Char]) => c :: cs, nonzero, many(digit))

Notice that this didn't seem to be a problem with map. This is because the function is type-checked after the receiver of the method: it gets given the right argument type straight away. Parsley has a syntax for leveraging this property:

import parsley.syntax.zipped.zippedSyntax2

(nonzero, many(digit)).zipped(_ :: _)

The zipped syntax, unlike the liftN combinators or lift syntax, is not lazy in any of its arguments, so care may be needed to use LazyParsley.unary_~ to restore laziness to those arguments that need it.

Use this form of lifting when type-inference fails you. Otherwise, for clarity, use a regular liftN, or the syntactic sugar for it:

import parsley.syntax.lift.{liftSyntax1, liftSyntax2}

val charCons = (c: Char, cs: List[Char]) => c :: cs

charCons.lift(nonzero, many(digit))
Num.lift(int)

The lift functions work all the way up to 22 arguments (which is a JVM limit). The same goes for the zipped syntax and lift syntax. Don't forget about <::> as well as its friends <~>, <*>, and <**>! They all provide a concise way of combining things in (common) special cases.

A note for Haskellers

In Scala, curried application is not as favoured as it is in Haskell for performance reasons. The classic f <$> p <*> .. <*> z pattern that is common in Haskell is unfavourable compared to the scala liftN(f, p, .., z). For the latter, f is uncurried, which is the norm, and so it is almost always more efficient. Both <*> and <**> should be, therefore, used sparingly in idiomatic parsley code instead of liberally like in Haskell.

However, it goes without saying that lift2[A => B, A, B]((f, x) => f(x), pf, px) is no more efficient than pf <*> px so the latter is favoured for that use case!