gigaparsec-0.3.0.0: Refreshed parsec-style library for compatibility with Scala parsley
LicenseBSD-3-Clause
MaintainerJamie Willis, Gigaparsec Maintainers
Stabilityexperimental
Safe HaskellSafe
LanguageHaskell2010

Text.Gigaparsec.Token.Descriptions

Description

This module contains the descriptions of various lexical structures to configure the lexer.

Many languages share common lexical tokens, such as numeric and string literals. Writing lexers turning these strings into tokens is effectively boilerplate. A Description encodes how to lex one of these common tokens. Feeding a LexicalDesc to a Lexer provides many combinators for dealing with these tokens.

Usage

Rather than use the internal constructors, such as NameDesc, one should extend the 'plain' definitions with record field updates. For example,

myLexicalDesc = plain
  { nameDesc = myNameDesc
  , textDesc = myTextDesc
  }

will produce a description that overrides the default name and text descriptions by those given. See plainName, plainSymbol, plainNumeric, plainText and plainSpace for further examples.

Since: 0.2.2.0

Synopsis

Lexical Descriptions

A lexer is configured by extending the default plain template, producing a LexicalDesc.

Name Descriptions

A NameDesc configures the lexing of name-like tokens, such as variable and function names. To create a NameDesc, use plainName, and configure it to your liking with record updates.

Symbol Descriptions

A SymbolDesc configures the lexing of 'symbols' (textual literals), such as keywords and operators. To create a SymbolDesc, use plainSymbol and configure it to your liking with record updates.

Numeric Descriptions

Exponent Descriptions

Break-Characters in Numeric Literals

Some languages allow a single numeric literal to be separated by a 'break' symbol.

Numeric Literal Prefix Configuration

Text Descriptions

A TextDesc configures the lexing of string and character literals, as well as escaped numeric literals. To create a TextDesc, use plainText and configure it to your liking with record updates. See EscapeDesc, NumericEscape and NumberOfDigits for further configuration of escape sequences and escaped numeric literals.

Escape Character Descriptions

Configuration of escape sequences, such as tabs t and newlines n, and escaped numbers, such as hexadecimals 0x... and binary 0b....

Numeric Escape Sequences

Configuration of escaped numeric literals. For example, hexadecimals, 0x....

Whitespace and Comment Descriptions

data ExponentDesc Source #

Describe how scientific exponent notation can be used within real literals.

A common notation would be 1.6e3 for 1.6 × 10³, which the following ExponentDesc describes:

{-# LANGUAGE OverloadedLists #-} -- Lets us write [a] to generate a singleton Set containing a.
usualNotation :: ExponentDesc
usualNotation = ExponentsSupported
  { compulsory = False
  , chars = ['e']  -- The letter 'e' separates the significand from the exponent
  , base  = 10   -- The base of the exponent is 10, so that 2.3e5 means 2.3 × 10⁵
  , expSign = PlusOptional -- A positive exponent does not need a plus sign, but can have one.
  , expLeadingZerosAllowd = True -- We allow leading zeros on exponents; so 1.2e005 is valid.
  }

Constructors

NoExponents

The language does not allow exponent notation.

ExponentsSupported

The language does allow exponent notation, according to the following fields:

Fields

data BreakCharDesc Source #

Prescribes whether or not numeric literals can be broken up by a specific symbol.

For example, can one write 300.2_3?

Constructors

NoBreakChar

Literals cannot be broken.

BreakCharSupported

Literals can be broken.

Fields

data SpaceDesc Source #

This type describes how whitespace and comments should be handled lexically.

Constructors

SpaceDesc 

Fields

data NumericEscape Source #

Describes how numeric escape sequences should work for a given base.

Constructors

NumericIllegal

Numeric literals are disallowed for this specific base.

NumericSupported

Numeric literals are supported for this specific base.

Fields

  • prefix :: !(Maybe Char)

    the character, if any, that is required to start the literal (like x for hexadecimal escapes in some languages).

  • numDigits :: !NumberOfDigits

    the number of digits required for this literal: this may be unbounded, an exact number, or up to a specific number.

  • maxValue :: !Char

    the largest character value that can be expressed by this numeric escape.

data LexicalDesc Source #

This type describes the aggregation of a bunch of different sub-configurations for lexing a specific language.

See the plain smart constructor to define a LexicalDesc.

Constructors

LexicalDesc 

Fields

data NumericDesc Source #

This type describes how numeric literals (integers, decimals, hexadecimals, etc...), should be lexically processed.

Constructors

NumericDesc 

Fields

data SymbolDesc Source #

This type describes how symbols (textual literals in a BNF) should be processed lexically, including keywords and operators.

This includes keywords and (hard) operators that are reserved by the language. For example, in Haskell, "data" is a keyword, and "->" is a hard operator.

See the plainSymbol smart constructor for how to implement a custom name description.

Constructors

SymbolDesc 

Fields

data NameDesc Source #

This type describes how name-like things are described lexically.

In particular, this defines which characters will constitute identifiers and operators.

See the plainName smart constructor for how to implement a custom name description.

Constructors

NameDesc 

Fields

type CharPredicate = Maybe (Char -> Bool) Source #

An optional predicate on characters: if pred :: CharPredicate and pred x = Just True, then the lexer should accept the character x.

Examples

Expand
  • A predicate that only accepts alphabetical or numbers:
   isAlphaNumPred = Just . isAlphaNum
 
  • A predicate that only accepts capital letters:
   isCapital = Just . isAsciiUpper
 

data PlusSignPresence Source #

Whether or not a plus sign (+) can prefix a numeric literal.

Constructors

PlusRequired

(+) must always precede a positive numeric literal

PlusOptional

(+) may precede a positive numeric literal, but is not necessary

PlusIllegal

(+) cannot precede a numeric literal as a prefix (this is separate to allowing an infix binary + operator).

data TextDesc Source #

This type describes how to parse string and character literals.

Constructors

TextDesc 

Fields

data EscapeDesc Source #

Defines the escape characters, and their meaning.

This includes character escapes (e.g. tabs, carriage returns), and numeric escapes, such as binary (usually "0b") and hexadecimal, "0x".

Constructors

EscapeDesc 

Fields

  • escBegin :: !Char

    the character that begins an escape sequence: this is usually \.

  • literals :: !(Set Char)

    the characters that can be directly escaped, but still represent themselves, for instance '"', or '\'.

  • mapping :: !(Map String Char)

    the possible escape sequences that map to a character other than themselves and the (full UTF-16) character they map to, for instance "n" -> 0xa

  • decimalEscape :: !NumericEscape

    if allowed, the description of how numeric escape sequences work for base 10.

  • hexadecimalEscape :: !NumericEscape

    if allowed, the description of how numeric escape sequences work for base 16

  • octalEscape :: !NumericEscape

    if allowed, the description of how numeric escape sequences work for base 8

  • binaryEscape :: !NumericEscape

    if allowed, the description of how numeric escape sequences work for base 2

  • emptyEscape :: !(Maybe Char)

    if one should exist, the character which has no effect on the string but can be used to disambiguate other escape sequences: in Haskell this would be &

  • gapsSupported :: !Bool

    specifies whether or not string gaps are supported: this is where whitespace can be injected between two escBegin characters and this will all be ignored in the final string, such that "hello world" is "hello world"

data NumberOfDigits Source #

Describes how many digits a numeric escape sequence is allowed.

Constructors

Unbounded

there is no limit on the number of digits that may appear in this sequence.

Exactly !(NonEmpty Word)

the number of digits in the literal must be one of the given values.

AtMost

there must be at most n digits in the numeric escape literal, up to and including the value given.

Fields

  • !Word

    the maximum (inclusive) number of digits allowed in the literal..

plain :: LexicalDesc Source #

This lexical description contains the template plain<...> descriptions defined in this module. See plainName, plainSymbol, plainNumeric, plainText and plainSpace for how this description configures the lexer.

plainName :: NameDesc Source #

This is a blank name description template, which should be extended to form a custom name description.

In its default state, plainName makes no characters able to be part of an identifier or operator. To change this, one should use record field copies, for example:

myNameDesc :: NameDesc
myNameDesc = plainName
  { identifierStart = myIdentifierStartPredicate
  , identifierLetter = myIdentifierLetterPredicate
  }

myNameDesc with then lex identifiers according to the given predicates.

plainSymbol :: SymbolDesc Source #

This is a blank symbol description template, which should be extended to form a custom symbol description.

In its default state, plainSymbol has no keywords or reserved/hard operators. To change this, one should use record field copies, for example:

{-# LANGUAGE OverloadedLists #-} -- This lets us write [a,b] to get a Set containing a and b
                                 -- If you don't want to use this, just use fromList [a,b]
mySymbolDesc :: SymbolDesc
mySymbolDesc = plainSymbol
  { hardKeywords = ["data", "where"]
  , hardOperators = ["->"]
  , caseSensitive = True
  }

mySymbolDesc with then treat data and where as keywords, and -> as a reserved operator.

plainNumeric :: NumericDesc Source #

This is a blank numeric description template, which should be extended to form a custom numeric description.

In its default state, plainNumeric allows for hex-, oct-, and bin-ary numeric literals, with the standard prefixes. To change this, one should use record field copies.

plainText :: TextDesc Source #

This is a blank text description template, which should be extended to form a custom text description.

In its default state, plainText parses characters as symbols between ' and ', and strings between " and ". To change this, one should use record field copies, for example:

{-# LANGUAGE OverloadedLists #-} -- This lets us write [a,b] to get a Set containing a and b
                                 -- If you don't want to use this, just use fromList [a,b]
myPlainText:: TextDesc
myPlainText= plainText
  { characterLiteralEnd = a
  , stringEnds = [(b, c)]
  }

myPlainText with then parse characters as a single character between a and a, and a string as characters between b and c.

plainSpace :: SpaceDesc Source #

This is a blank whitespace description template, which should be extended to form the desired whitespace descriptions.

In its default state, plainName makes no comments possible, and the only whitespace characters are those defined by isSpace

plainEscape :: EscapeDesc Source #

This is a blank escape description template, which should be extended to form a custom escape description.

In its default state, plainEscape the only escape symbol is a backslash, "\\". To change this, one should use record field copies, for example:

{-# LANGUAGE OverloadedLists #-} -- This lets us write [a,b] to get a Set containing a and b,
                                 -- and [(a,b),(c,d)] for a Map which sends a ↦ b and c ↦ d
myPlainEscape:: EscapeDesc
myPlainEscape= plainEscape
  { literals = a
  , stringEnds = [(b, c)]
  , mapping = [("t",0x0009), ("r",0x000D)]
  , hexadecimalEscape = NumericSupported TODO
  }

myPlainText with then parse characters as a single character between a and a, and a string as characters between b and c.