License | BSD-3-Clause |
---|---|
Maintainer | Jamie Willis, Gigaparsec Maintainers |
Stability | stable |
Safe Haskell | Safe |
Language | Haskell2010 |
This module contains implementations of token extractors that can be used in the Text.Gigaparsec.Errors.ErrorBuilder to decide how to extract unexpected tokens from the residual input left over from a parse error.
These are common strategies, and something here is likely to be what is needed. They are all careful to handle unprintable characters and whitespace in a sensible way, and account for unicode codepoints that are wider than a single 16-bit character.
Since: 0.2.5.0
Synopsis
- data Token
- type TokenExtractor = NonEmpty Char -> Word -> Bool -> Token
- tillNextWhitespace :: Bool -> (Char -> Bool) -> TokenExtractor
- singleChar :: TokenExtractor
- matchParserDemand :: TokenExtractor
- lexToken :: [Parsec String] -> TokenExtractor -> TokenExtractor
- lexTokenWithSelect :: (NonEmpty (String, Word) -> (String, Word)) -> [Parsec String] -> TokenExtractor -> TokenExtractor
Documentation
This type represents an extracted token returned by unexpectedToken
in ErrorBuilder
.
There is deliberately no analogue for EndOfInput
because we guarantee that non-empty
residual input is provided to token extraction.
Since: 0.2.5.0
type TokenExtractor Source #
= NonEmpty Char | the remaining input, |
-> Word | the input the parser tried to read when it failed
(this is not guaranteed to be smaller than the length of
|
-> Bool | was this error generated as part of "lexing", or in a wider parser (see |
-> Token | a token extracted from |
Type alias for token extractors, matches the shape of
unexpectedToken
.
Since: 0.2.5.0
:: Bool | should the extractor cap the token to the amount of input the parser demanded? |
-> (Char -> Bool) | what counts as a space character |
-> TokenExtractor |
This extractor provides an implementation for unexpectedToken
:
it will construct a token that extends to the next available whitespace in the remaining input.
It can be configured to constrict this token to the minimum of the next whitespace or whatever the
parser demanded.
In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.
Since: 0.2.5.0
singleChar :: TokenExtractor Source #
This extractor provides an implementation for unexpectedToken
:
it will unconditionally report the first character in the remaining input as the problematic token.
In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.
Since: 0.2.5.0
matchParserDemand :: TokenExtractor Source #
This extractor provides an implementation for unexpectedToken
:
it will make a token as wide as the amount of input the parser tried to consume when it failed.
In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.
Since: 0.2.5.0
:: [Parsec String] | The tokens that should be recognised by this extractor: each parser should return the intended name of the token exactly as it should appear in the Named token. This should include a whitespace parser for "unexpected whitespace". However, with the
exception of the whitespace parser, these tokens should not consume trailing (and
certainly not leading) whitespace: if using definitions from Text.Gigaparsec.Token.Lexer
functionality, the |
-> TokenExtractor | If the parser failed during the parsing of a token, this function extracts the problematic item from the remaining input. |
-> TokenExtractor |
This extractor provides an implementation for unexpectedToken
:
it will try and parse the residual input to identify a valid lexical token to report.
When parsing a grammar that as a dedicated lexical distinction, it is nice to be able to report problematic tokens relevant to that grammar as opposed to generic input lifted straight from the input stream. The easiest way of doing this would be having a pre-lexing pass and parsing based on tokens, but this is deliberately not how Parsley is designed. Instead, this extractor can try and parse the remaining input to try and identify a token on demand.
If the lexicalError
flag of the unexpectedToken
function is not set, which would indicate a
problem within a token reported by a classical lexer and not the parser, the extractor will
try to parse each of the provided tokens
in turn: whichever is the longest matched of these
tokens will be reported as the problematic one, where an earlier token arbitrates ties
(lexTokenWithSelect
can alter which is chosen). For best effect, these tokens should not consume
whitespace (which would otherwise be included at the end of the token!): this means that, if
using the Lexer
, the functionality in nonlexeme
should be used. If one of the
givens tokens cannot be parsed, the input until the next valid parsable token (or end of input)
is returned as a Raw
.
If lexicalError
is true, then the given token extractor will be used instead to extract a
default token.
Since: 0.2.5.0
:: (NonEmpty (String, Word) -> (String, Word)) | If the extractor is successful in identifying tokens that can be parsed from the residual input, this function will select one of them to report back. |
-> [Parsec String] | The tokens that should be recognised by this extractor: each parser should return the intended name of the token exactly as it should appear in the Named token. This should include a whitespace parser for "unexpected whitespace". However, with the
exception of the whitespace parser, these tokens should not consume trailing (and
certainly not leading) whitespace: if using definitions from Text.Gigaparsec.Token.Lexer
functionality, the |
-> TokenExtractor | If the parser failed during the parsing of a token, this function extracts the problematic item from the remaining input. |
-> TokenExtractor |
This extractor provides an implementation for unexpectedToken
:
it will try and parse the residual input to identify a valid lexical token to report.
When parsing a grammar that as a dedicated lexical distinction, it is nice to be able to report problematic tokens relevant to that grammar as opposed to generic input lifted straight from the input stream. The easiest way of doing this would be having a pre-lexing pass and parsing based on tokens, but this is deliberately not how Parsley is designed. Instead, this extractor can try and parse the remaining input to try and identify a token on demand.
If the lexicalError
flag of the unexpectedToken
function is not set, which would indicate a
problem within a token reported by a classical lexer and not the parser, the extractor will
try to parse each of the provided tokens
in turn: the given function is used to select which
is returned.
For best effect, these tokens should not consume
whitespace (which would otherwise be included at the end of the token!): this means that, if
using the Lexer
, the functionality in nonlexeme
should be used. If one of the
givens tokens cannot be parsed, the input until the next valid parsable token (or end of input)
is returned as a Raw
.
If lexicalError
is true, then the given token extractor will be used instead to extract a
default token.
Since: 0.2.5.0