Configuring the Lexer (parsley.token.descriptions
)
The Lexer
is configured primarily by providing it a
LexicalDesc
. This is a structure built up
of many substructures that each configure a specific part of the overall functionality available.
In general, many parts of this hierarchy have "sensible defaults" in the form of their plain
value
within their companion objects; these document what choices were made in each individual case. There
may also be some values crafted to adhere to some specific language specification; for instance,
EscapeDesc.haskell
describes escape characters that adhere to the Haskell Report.
This page does not aim to document everything that is configurable within LexicalDesc
, but it will
outline the general design and how things slot together.
Diagram of Dependencies
The hierarchy of types involved with lexical configuration can be daunting. The following diagram
illustrates both the "has-a" and "is-a" relationships between the types. For instance, TextDesc
contains an EscapeDesc
, and NumericEscape
may be implemented by either NumericEscape.Illegal
or
NumericEscape.Supported
.
classDiagram direction LR LexicalDesc *-- NumericDesc LexicalDesc *-- NameDesc LexicalDesc *-- SpaceDesc LexicalDesc *-- SymbolDesc LexicalDesc *-- TextDesc TextDesc *-- EscapeDesc EscapeDesc *-- NumericEscape NumericEscape_Illegal --|> NumericEscape NumericEscape_Supported --|> NumericEscape NumericEscape_Supported *-- NumberOfDigits NumberOfDigits_AtMost --|> NumberOfDigits NumberOfDigits_Exactly --|> NumberOfDigits NumberOfDigits_Unbounded --|> NumberOfDigits ExponentDesc --* NumericDesc ExponentDesc_Supported --|> ExponentDesc ExponentDesc_NoExponents --|> ExponentDesc NumericDesc *-- BreakCharDesc ExponentDesc_Supported *-- PlusSignPresence BreakCharDesc <|-- BreakCharDesc_Supported BreakCharDesc <|-- BreakCharDesc_NoBreakChar PlusSignPresence <|-- PlusSignPresence_Illegal PlusSignPresence <|-- PlusSignPresence_Optional PlusSignPresence <|-- PlusSignPrecense_Required NumericDesc *-- PlusSignPresence
In the above diagram, an _
represents a .
The types in the diagram that have alternative implements are as follows:
BreakDescChar
: used to describe whether or not numeric literals can contain meaningless "break characters", like_
. It can either beNoBreakChar
, which disallows them; orSupported
, which will specify the character and whether it is legal to appear after a non-decimal prefix like hexadecimal0x
.PlusSignPresence
: used to describe whether or not a+
is allowed in numeric literals, which appears for the start of numeric literals and floating-point exponents. It can either beRequired
, which means either a+
or-
must always be written;Optional
, which means a+
can be written; orIllegal
, which means only a-
can appear.ExponentDesc
: used to describe how an exponent is formed for different bases of floating point literals. It can either beSupported
, in which case it will indicate whether it is compulsory, what characters can start it, what the numeric base of the exponent number itself is, and then what thePlusSignPresence
is, as above; otherwise, it isNoExponents
, which means that the exponent notation is not supported for a specific numeric base.NumericEscape
: used to describe whether or not numeric escape sequences are allowed in string and character literals. It either beIllegal
, which means there are no numeric escapes; orSupported
, which means that the prefix,NumberOfDigits
, and the maximum value of the escape must all be specified.NumberOfDigits
: used by the aboveNumericEscape
to determine how many digits can appear within a numeric escape literal. These can be one of:Unbounded
, which means there can be any well-formed number as the escape;AtMost
, which puts an upper limit on the number of digits that can appear; orExactly
, which details one or more exact numbers of digits that could appear, for instance, some languages allow for numeric escapes with exactly 2, 4, or 6 digits in them only.