Parsing Characters
The consumption of input is central to parsing. In parsley
, the only way of
consuming input is by consuming characters, as customised token streams are
not supported. However, as parsley
is based on Scala, which relies on
16-bit Basic-Multilingual Plane UTF16 characters, there are two modules for
input consumption: character
, for 16-bit Char
, and unicode
, handling 32-bit Int
s.
The Scaladoc for this page can be found at parsley.character
and parsley.unicode
.
Character Primitives
There are three key input consumption primitives in parsley
: satisfy
, which
takes a predicate and parses any character for which that predicate returns true; char
, which reads a specific character; and string
, which reads a specific
sequence of characters. Technically, char
and string
can be built in terms of
satisfy
, but would then lose their enhanced error message characteristics.
The oneOf
and noneOf
are based on satisfy
but with better error messages,
based on the kind of range of characters they are provided with.
The strings
and trie
combinators can be used similarly to match a set of
strings: they are likely to parse these more efficiently than a manual
construction and may improve further in future.
String Building
It is possible to efficiently construct a String
using the stringOfMany
and
stringOfSome
combinators. This can be used with either a character predicate (which turns out to be called takeWhileP
in megaparsec
), or can be given a
parser that consumes a character instead. The construction of the underlying
string is done efficiently with a StringBuilder
. Note that, with the version
that takes a parser, the characters incorporated into the string are the ones
returned by the parser and not necessarily the ones the parser consumed.
Pre-Built Character Parsers
Some pre-built parsers are bunched into character
and unicode
, which parse
specific sets of letter, with good error messages. Each will document the specific
set of characters they recognise.