License | BSD-3-Clause |
---|---|
Maintainer | Jamie Willis, Gigaparsec Maintainers |
Stability | experimental |
Safe Haskell | Safe |
Language | Haskell2010 |
This module contains the descriptions of various lexical structures to configure the lexer.
Many languages share common lexical tokens, such as numeric and string literals.
Writing lexers turning these strings into tokens is effectively boilerplate.
A Description encodes how to lex one of these common tokens.
Feeding a LexicalDesc
to a Lexer
provides many combinators
for dealing with these tokens.
Usage
Rather than use the internal constructors, such as NameDesc
, one should extend the 'plain
' definitions with record field updates.
For example,
myLexicalDesc = plain { nameDesc = myNameDesc , textDesc = myTextDesc }
will produce a description that overrides the default name and text descriptions by those given.
See plainName
, plainSymbol
, plainNumeric
, plainText
and plainSpace
for further examples.
Since: 0.2.2.0
Synopsis
- data ExponentDesc
- = NoExponents
- | ExponentsSupported {
- compulsory :: !Bool
- chars :: !(Set Char)
- base :: !Int
- expSign :: !PlusSignPresence
- expLeadingZerosAllowd :: !Bool
- data BreakCharDesc
- = NoBreakChar
- | BreakCharSupported { }
- data SpaceDesc = SpaceDesc {}
- data NumericEscape
- = NumericIllegal
- | NumericSupported { }
- data LexicalDesc = LexicalDesc {
- nameDesc :: !NameDesc
- symbolDesc :: !SymbolDesc
- numericDesc :: !NumericDesc
- textDesc :: !TextDesc
- spaceDesc :: !SpaceDesc
- data NumericDesc = NumericDesc {
- literalBreakChar :: !BreakCharDesc
- leadingDotAllowed :: !Bool
- trailingDotAllowed :: !Bool
- leadingZerosAllowed :: !Bool
- positiveSign :: !PlusSignPresence
- integerNumbersCanBeHexadecimal :: !Bool
- integerNumbersCanBeOctal :: !Bool
- integerNumbersCanBeBinary :: !Bool
- realNumbersCanBeHexadecimal :: !Bool
- realNumbersCanBeOctal :: !Bool
- realNumbersCanBeBinary :: !Bool
- hexadecimalLeads :: !(Set Char)
- octalLeads :: !(Set Char)
- binaryLeads :: !(Set Char)
- decimalExponentDesc :: !ExponentDesc
- hexadecimalExponentDesc :: !ExponentDesc
- octalExponentDesc :: !ExponentDesc
- binaryExponentDesc :: !ExponentDesc
- data SymbolDesc = SymbolDesc {
- hardKeywords :: !(Set String)
- hardOperators :: !(Set String)
- caseSensitive :: !Bool
- data NameDesc = NameDesc {}
- type CharPredicate = Maybe (Char -> Bool)
- data PlusSignPresence
- data TextDesc = TextDesc {
- escapeSequences :: !EscapeDesc
- characterLiteralEnd :: !Char
- stringEnds :: !(Set (String, String))
- multiStringEnds :: !(Set (String, String))
- graphicCharacter :: !CharPredicate
- data EscapeDesc = EscapeDesc {
- escBegin :: !Char
- literals :: !(Set Char)
- mapping :: !(Map String Char)
- decimalEscape :: !NumericEscape
- hexadecimalEscape :: !NumericEscape
- octalEscape :: !NumericEscape
- binaryEscape :: !NumericEscape
- emptyEscape :: !(Maybe Char)
- gapsSupported :: !Bool
- data NumberOfDigits
- plain :: LexicalDesc
- plainName :: NameDesc
- plainSymbol :: SymbolDesc
- plainNumeric :: NumericDesc
- plainText :: TextDesc
- plainSpace :: SpaceDesc
- plainEscape :: EscapeDesc
Lexical Descriptions
A lexer is configured by extending the default plain
template, producing a LexicalDesc
.
Name Descriptions
A NameDesc
configures the lexing of name-like tokens, such as variable and function names.
To create a NameDesc
, use plainName
, and configure it to your liking with record updates.
Symbol Descriptions
A SymbolDesc
configures the lexing of 'symbols' (textual literals), such as keywords and operators.
To create a SymbolDesc
, use plainSymbol
and configure it to your liking with record updates.
Numeric Descriptions
A NumericDesc
configures the lexing of numeric literals, such as integer and floating point literals.
To create a NumericDesc
, use plainNumeric
and configure it to your liking with record updates.
Also see ExponentDesc
, BreakCharDesc
, and PlusSignPresence
, for further configuration options.
literalBreakChar
leadingDotAllowed
trailingDotAllowed
leadingZerosAllowed
positiveSign
integerNumbersCanBeHexadecimal
integerNumbersCanBeOctal
integerNumbersCanBeBinary
realNumbersCanBeHexadecimal
realNumbersCanBeOctal
realNumbersCanBeBinary
hexadecimalLeads
octalLeads
binaryLeads
decimalExponentDesc
hexadecimalExponentDesc
octalExponentDesc
binaryExponentDesc
plainNumeric
Exponent Descriptions
An ExponentDesc
configures scientific exponent notation.
Break-Characters in Numeric Literals
Some languages allow a single numeric literal to be separated by a 'break' symbol.
Numeric Literal Prefix Configuration
Text Descriptions
A TextDesc
configures the lexing of string and character literals, as well as escaped numeric literals.
To create a TextDesc
, use plainText
and configure it to your liking with record updates.
See EscapeDesc
, NumericEscape
and NumberOfDigits
for further configuration of escape sequences and escaped numeric literals.
Escape Character Descriptions
Configuration of escape sequences, such as tabs t
and newlines n
, and
escaped numbers, such as hexadecimals 0x...
and binary 0b...
.
Numeric Escape Sequences
Configuration of escaped numeric literals.
For example, hexadecimals, 0x...
.
Whitespace and Comment Descriptions
A SpaceDesc
configures the lexing whitespace and comments.
To create a SpaceDesc
, use plainSpace
and configure it to your liking with record updates.
data ExponentDesc Source #
Describe how scientific exponent notation can be used within real literals.
A common notation would be 1.6e3
for 1.6 × 10³
, which the following ExponentDesc
describes:
{-# LANGUAGE OverloadedLists #-} -- Lets us write[a]
to generate a singletonSet
containinga
. usualNotation :: ExponentDesc usualNotation = ExponentsSupported { compulsory = False , chars = ['e'] -- The letter 'e' separates the significand from the exponent , base = 10 -- The base of the exponent is 10, so that2.3e5
means2.3 × 10⁵
, expSign = PlusOptional -- A positive exponent does not need a plus sign, but can have one. , expLeadingZerosAllowd = True -- We allow leading zeros on exponents; so1.2e005
is valid. }
NoExponents | The language does not allow exponent notation. |
ExponentsSupported | The language does allow exponent notation, according to the following fields: |
|
data BreakCharDesc Source #
Prescribes whether or not numeric literals can be broken up by a specific symbol.
For example, can one write 300.2_3
?
NoBreakChar | Literals cannot be broken. |
BreakCharSupported | Literals can be broken. |
|
This type describes how whitespace and comments should be handled lexically.
SpaceDesc | |
|
data NumericEscape Source #
Describes how numeric escape sequences should work for a given base.
NumericIllegal | Numeric literals are disallowed for this specific base. |
NumericSupported | Numeric literals are supported for this specific base. |
|
data LexicalDesc Source #
This type describes the aggregation of a bunch of different sub-configurations for lexing a specific language.
See the plain
smart constructor to define a LexicalDesc
.
LexicalDesc | |
|
data NumericDesc Source #
This type describes how numeric literals (integers, decimals, hexadecimals, etc...), should be lexically processed.
NumericDesc | |
|
data SymbolDesc Source #
This type describes how symbols (textual literals in a BNF) should be processed lexically, including keywords and operators.
This includes keywords and (hard) operators that are reserved by the language. For example, in Haskell, "data" is a keyword, and "->" is a hard operator.
See the plainSymbol
smart constructor for how to implement a custom name description.
SymbolDesc | |
|
This type describes how name-like things are described lexically.
In particular, this defines which characters will constitute identifiers and operators.
See the plainName
smart constructor for how to implement a custom name description.
NameDesc | |
|
type CharPredicate = Maybe (Char -> Bool) Source #
An optional predicate on characters:
if pred :: CharPredicate
and pred x = Just True
, then the lexer should accept the character x
.
Examples
- A predicate that only accepts alphabetical or numbers:
isAlphaNumPred = Just . isAlphaNum
- A predicate that only accepts capital letters:
isCapital = Just . isAsciiUpper
data PlusSignPresence Source #
Whether or not a plus sign (+
) can prefix a numeric literal.
PlusRequired | ( |
PlusOptional | ( |
PlusIllegal | ( |
This type describes how to parse string and character literals.
TextDesc | |
|
data EscapeDesc Source #
Defines the escape characters, and their meaning.
This includes character escapes (e.g. tabs, carriage returns), and numeric escapes, such as binary (usually "0b") and hexadecimal, "0x".
EscapeDesc | |
|
data NumberOfDigits Source #
Describes how many digits a numeric escape sequence is allowed.
Unbounded | there is no limit on the number of digits that may appear in this sequence. |
Exactly !(NonEmpty Word) | the number of digits in the literal must be one of the given values. |
AtMost | there must be at most |
|
plain :: LexicalDesc Source #
This lexical description contains the template plain<...>
descriptions defined in this module.
See plainName
, plainSymbol
, plainNumeric
, plainText
and plainSpace
for how this description configures the lexer.
plainName :: NameDesc Source #
This is a blank name description template, which should be extended to form a custom name description.
In its default state, plainName
makes no characters able to be part of an identifier or operator.
To change this, one should use record field copies, for example:
myNameDesc :: NameDesc myNameDesc = plainName { identifierStart = myIdentifierStartPredicate , identifierLetter = myIdentifierLetterPredicate }
myNameDesc
with then lex identifiers according to the given predicates.
plainSymbol :: SymbolDesc Source #
This is a blank symbol description template, which should be extended to form a custom symbol description.
In its default state, plainSymbol
has no keywords or reserved/hard operators.
To change this, one should use record field copies, for example:
{-# LANGUAGE OverloadedLists #-} -- This lets us write[a,b]
to get aSet
containinga
andb
-- If you don't want to use this, just usemySymbolDesc :: SymbolDesc mySymbolDesc = plainSymbol { hardKeywords = ["data", "where"] , hardOperators = ["->"] , caseSensitive = True }
fromList
[a,b]
mySymbolDesc
with then treat data
and where
as keywords, and ->
as a reserved operator.
plainNumeric :: NumericDesc Source #
This is a blank numeric description template, which should be extended to form a custom numeric description.
In its default state, plainNumeric
allows for hex-, oct-, and bin-ary numeric literals,
with the standard prefixes.
To change this, one should use record field copies.
plainText :: TextDesc Source #
This is a blank text description template, which should be extended to form a custom text description.
In its default state, plainText
parses characters as symbols between '
and '
, and strings between "
and "
.
To change this, one should use record field copies, for example:
{-# LANGUAGE OverloadedLists #-} -- This lets us write[a,b]
to get aSet
containinga
andb
-- If you don't want to use this, just usemyPlainText:: TextDesc myPlainText= plainText { characterLiteralEnd = a , stringEnds = [(b, c)] }
fromList
[a,b]
myPlainText
with then parse characters as a single character between a
and a
, and a string as characters between b
and c
.
plainEscape :: EscapeDesc Source #
This is a blank escape description template, which should be extended to form a custom escape description.
In its default state, plainEscape
the only escape symbol is a backslash, "\\".
To change this, one should use record field copies, for example:
{-# LANGUAGE OverloadedLists #-} -- This lets us write[a,b]
to get aSet
containinga
andb
, -- and [(a,b),(c,d)] for aMap
which sendsa ↦ b
andc ↦ d
myPlainEscape:: EscapeDesc myPlainEscape= plainEscape { literals = a , stringEnds = [(b, c)] , mapping = [("t",0x0009), ("r",0x000D)] , hexadecimalEscape = NumericSupported TODO }
myPlainText
with then parse characters as a single character between a
and a
, and a string as characters between b
and c
.