LL(1) Parser
With this module, you can use parser combinators to create LL(1) parsers
working over a PeekableStringReader
input.
Build your parser with combinators then use one of ll1/reader
,
ll1/string
, ll1/port
, ll1/file
or ll1/file-lines
to consume input.
To use the bindings from this module:
(import :std/parser/ll1)
Beware that the 1 in LL(1) means that these parsers only use at most
one character of look-ahead from the reader, allowing them to work on raw ports.
However, this limitation means that if an alternative in a ll1-or
fails
further than at the start, the next alternative will start
from where the previous one failed, and not from the point at which the ll1-or
started.
This may be a reason to prefer ll1-case
over ll1-or
—
or to prefer a more advanced kind of parser if this is not enough.
Using LL(1) Parsers
ll1/reader
(ll1/reader parser reader [description] [where]) => parse-result
Given a LL(1) parser
and a reader
satisfying the PeekableStringReader
interface,
return the result of parsing the reader with the parser.
In case of error, use the description
and where
specification in the error message.
Examples:
> (ll1/reader ll1-sint (open-buffered-string-reader "-3"))
-3
ll1/string
(ll1/string parser string [description] [where]) => parse-result
Given a LL(1) parser
and a string
,
return the result of parsing the string with the parser.
In case of error, use the description
and where
specification in the error message.
Examples:
> (ll1/string (ll1-separated ll1-uint ll1-skip-space* ll1-eof) "12 3 4 56")
(12 3 4 56)
ll1/port
(ll1/port parser port [description] [where]) => parse-result
Given a LL(1) parser
and a port
,
return the result of parsing the port with the parser.
In case of error, use the description
and where
specification in the error message.
Examples:
> (call-with-input-string "foo\nbar\nbaz\n" (lambda (port) (ll1/port ll1-lines port)))
("foo" "bar" "baz")
ll1/file
(ll1/file parser path [description] [where]) => parse-result
Given a LL(1) parser
and a path
,
return the result of parsing the file at said path with the parser.
In case of error, use the description
and where
specification in the error message.
Examples:
> (ll1/file (ll1-result 42) "/dev/null")
42
ll1/file-lines
(ll1/file-lines parser path [description] [where]) => parse-result
Given a LL(1) parser
and a path
,
return a list containing the result of parsing each line of the file with the parser.
In case of error, use the description
and where
specification in the error message.
Examples:
> (ll1/file-lines ll1-line "/dev/null")
()
Helpers for LL(1) Parsers
peeker
(peeker spec) => procedure
Find a suitable procedure to recognize a character input (from peek-char
),
based on specification spec
:
- If
spec
is a procedure, it designates itself. - If
spec
is a character or the EOF object, it designates the function that recognizes exactly this character. - If
spec
is a string, it designates the function that recognizes the elements of the string. - If
spec
is a hash-table, it designates the function that recognizes the keys of the table, returning the respective value. - If
spec
is a list, each element is a specification as above, and the function designated is one that recognizes any of the characters recognized by one of these elements, tried in order.
eolf?
(eolf? char-or-eof) => bool
Return true if the argument is one of the character #\newline
,
the character #\return
or the eof-object #!eof
.
Examples:
> (andmap eolf? '(#\newline #\return #!eof))
#t
> (ormap eolf? '(#\a #\b 1 "foo"))
#t
peekable-eof?
(peekable-eof? reader) => bool
Return true if the reader has reached the end of its data stream.
Basic LL(1) Parsers
ll1-case
(ll1-case (spec body ...) ... (else e ...)) => parser
(ll1-case (spec body ...) ...) => parser
The ll1-case
macro creates a LL(1) parser that
peeks at one character (of eof-object), then tries
each function specified by spec
as per peeker
on that character in order.
If one spec
succeeds, it executes the body ...
associated with that spec
;
if that body
starts with =>
it is a function that takes the result of the peeker.
The body returns a function that in turn is called with the reader.
If no spec
succeeds, the body of the else
clause is used;
if no else
clause is specified, a parse-error
is raised.
Examples:
> (import :std/text/char-set)
> (ll1/string
(ll1-repeated
(ll1-case (#\r (ll1-string "red"))
("\r\n" ll1-eol)
(char-strict-whitespace? ll1-skip-space*)
(char-ascii-digit ll1-uint)) ll1-eof)
"\nred3 45\r\nred")
("\n" "red" 3 #!void 45 "\r\n" "red")
Note how in the example above, the #\newline
and #\return
characters
are accepted by both the "\r\n"
spec and the char-strict-whitespace?
spec,
but the first one wins and causes ll1-eol
to be called, returning the eol found.
ll1-peek
(ll1-peek spec) => parser
Return a parser that succeeds if the next character (or eof-object) is recognized
by the peeker
for the spec
, and otherwise fails.
The character is not read, but its value is returned.
Examples:
> (ll1/string (ll1-begin0 (ll1-peek "abc") (ll1-char* char?)) "banana")
#\b
ll1-empty
ll1-empty
This function parses the empty string and returns (void)
.
Trivial function to use while combining parsers.
ll1-char
(ll1-char spec) => parser
Return a parser that succeeds if the next character (or eof-object) is recognized
by the peeker
for the spec
, and otherwise fails.
The character is read, and its value returned.
Examples:
> (ll1/string (ll1-char #\x) "x")
#\x
ll1-string
(ll1-string string) => parser
Return a parser that succeeds if the input contains string,
consuming it, and returning it. If some character in the stream
is not the next character in the string, raise a parse-error
.
Examples:
> (ll1/string (ll1-string "banana") "banana")
"banana"
ll1-char?
(ll1-char? spec) => parser
Return a parser that always succeeds.
If the next character (or eof-object) is recognized by the peeker
for the spec
,
then consume and return it. Otherwise, return #f
.
Examples:
> (ll1/string (ll1-list (ll1-char? #\b) (ll1-char? #\o) (ll1-char? #\a)) "ba")
(#\b #f #\a)
ll1-char*
(ll1-char* spec) => parser
Return a parser that always succeeds.
It consumes as many characters recognized by the peeker
for the spec
as there are in the reader, and returns a string of them.
Examples:
> (ll1/string (ll1-char* "ban") "banana")
"banana"
ll1-char+
(ll1-char+ spec) => parser
Return a parser that consumes as many characters recognized
by the peeker
for the spec
as there are in the reader,
and returns a string of them. If the next character is not recognized,
raise a parse-error
.
Examples:
> (ll1/string (ll1-char+ "ban") "banana")
"banana"
ll1-skip-char*
(ll1-skip-char* spec) => parser
Return a parser that always succeeds.
It consumes as many characters recognized by the peeker
for the spec
as there are in the reader, and returns (void)
.
Examples:
> (list (ll1/string (ll1-skip-char* "ban") "banana"))
(#!void)
ll1-skip-space*
ll1-skip-space* => parser
Return a parser that always succeeds.
It consumes as many space characters (as per char-strict-whitespace?
)
as there are in the reader, and returns (void)
.
Examples:
> (list (ll1/string ll1-skip-space* " "))
(#!void)
ll1-skip-space*
(ll1-skip-space* spec) => parser
Return a parser that always succeeds.
It consumes as many space characters (as per char-strict-whitespace?
)
as there are in the reader, and returns (void)
.
Examples:
> (list (ll1/string ll1-skip-space* " "))
(#!void)
ll1-eof
ll1-eof => parser
A parser that succeeds when the reader is at eof.
ll1-eolf?
ll1-eolf? => parser
A parser that succeeds when the reader is at eof or before an eol. Does not consume the eol.
ll1-eol
ll1-eol => parser
A parser that succeeds when the reader is before an eol.
Consumes the eol and returns it: "\n"
, "\r"
or "\r\n"
.
ll1-eolf
ll1-eolf => parser
A parser that succeeds when the reader is before an eol or at eof.
Consumes the eol of eof and returns it: "\n"
, "\r"
, "\r\n"
or #!eof
.
ll1-uint
ll1-uint => parser
(cut ll1-uint <> base) => parser
Parser that parses a unsigned integer in decimal or in the specified base
, return it.
No sign may be specified.
ll1-sint
ll1-sint => parser
(cut ll1-sint <> base) => parser
Parser that parses a signed integer in decimal or in the specified base
, return it.
A sign may be specified, or omitted.
ll1-n-chars
(ll1-n-chars n spec) => parser
Parser that reads n
chars that satisfy the peeker
spec
and returns them as a string.
ll1-n-digits
(ll1-n-digits n [base]) => parser
Parser that reads n
digits in specified base
(defaults to 10),
and returns them as a number.
ll1-line
ll1-line => parser
Parser that reads all characters to end of line or file, and returns them in a string. Consumes the characters in line, but not those in the eol.
LL(1) Parser Combinators
ll1-result
(ll1-result result) => parser
Macro that evaluates to a parser that always returns the result
without consuming any input.
ll1-pure
(ll1-pure result) => parser
Function that returns to a parser that always returns the result
without consuming any input.
ll1-bind
(ll1-bind Ma aMb) => parser
Given a parser Ma
that returns a value of type a
,
and a unary function aMb
that takes a value of type a
and returns a value of type b
,
returns a parser that first runs Ma
, then takes its return value,
calls aMb
with it, runs the parser returned,
and returns the value of that parser.
NB: ll1-bind
and ll1-pure
give LL(1) parsers a monad structure.
ll1-begin
(ll1-begin parsers ... last-parser) => parser
A parser that runs all the parsers, and returns the value of the last one.
ll1-begin0
(ll1-begin0 first-parser parsers ...) => parser
A parser that runs all the parsers, and returns the value of the last one.
ll1-or
(ll1-or parsers ...) => parser
A parser that tries each of the parsers
in sequence. If one fails, tries the next one.
Beware that if any parser fails after the first character,
the next parser will be run from where it failed, not from the start.
For better behavior, use a different kind of parsers than LL(1).
For safer behavior within LL(1), use ll1-case
.
ll1-or/list
(ll1-or list-of-parsers) => parser
A parser that tries each of the parsers in the list-of-parsers
in sequence.
If one fails, tries the next one.
Beware that if any parser fails after the first character,
the next parser will be run from where it failed, not from the start.
For better behavior, use a different kind of parsers than LL(1).
For safer behavior within LL(1), use ll1-case
.
ll1-repeated
(ll1-repeated element terminator [rhead]) => parser
A parser that repeatedly tries to run the element
parser,
unless the terminator
parser succeeds.
In the end, return the list of results from the element parser, in order,
prepended by the reverse of the list rhead
(defaults to the empty list '()
).
ll1-separated
(ll1-separated element separator terminator [rhead]) => parser
A parser that repeatedly tries to run the element
parser
with the separator
parser in between two runs of the element
parser,
until terminator
parser succeeds.
In the end, return the list of results from the element
parser, in order,
prepended by the reverse of the list rhead
(defaults to the empty list '()
).
Empty lists are accepted.
ll1*
(ll1* f . elements) => parser
A parser that runs each of the parsers in elements
in sequence,
collects their results, and calls f
on the results.
ll1-list
(ll1-list f . elements) => parser
A parser that runs each of the parsers in elements
in sequence,
and collects their results into a list.
ll1-n-times
(ll1-n-times n element) => parser
A parser that runs the element
parser exactly n
times
and collects the results into a list.
ll1-lines
ll1-lines => parser
(cut ll1-lines <> line) => parser
A parser that reads many lines each as by the line
parser if provided or ll1-line
by default,
and collects the parsing results in a list.
ll1-to-eof
(ll1-to-eof parser) => parser
A parser that parses results as per parser
then fails if the EOF hasn't been reached.
ll1-skip-space-to-eof
ll1-skip-space-to-eof => parser
A parser that skips over any remaining space until EOF.