m4/examples/null.err
Eric Blake 93726d6dd2 eval: Overhaul implementation for speed and correctness
While a recursive descent parser is easy to write, it involves a LOT
of function calls and boilerplate.  Merely parsing "eval(1)" requires
descending through ALL 11 levels of operator precedence, only for each
layer to discover there is no operator.  Better is the Pratt style of
LR(1) parsing [1], which can handle any grammar where no two
consecutive non-terminals or epsilon appear in the right side of any
rule [2].  Now, parsing is done with just two mutually recursive
functions; "eval(1)" works with just two function calls (primary()
determines the value, and parse_expr() determines no operators are
present), while more complicated expressions still produce the correct
results but with less recursion.

While at it, I noticed that "eval(1||(1/0))" used to produce a cryptic
message:

m4:stdin:1: bad expression in eval (excess input): 1||(1/0)

despite the similar "eval(1||1/0)" suppressing that as part of
short-circuiting.  It turns out that my initial implementation of
short-circuiting in 1.4.8b (back in 2007!) was never fully tested on
more complex situations.

To test that the new implementation is indeed faster, I wrote an m4
solution [3] to an Advent of Code challenge [4] that required
computing 2000 iterations of a 24-bit linear feedback shift register
over 2000 input values (--trace shows nearly 20 million eval calls).
On my machine, runtime with an unoptimized pre-patch m4 was at 78
seconds, post-patch it completes in 66 seconds.

[1] https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
[2] https://en.wikipedia.org/wiki/Operator-precedence_parser
[3] https://repo.or.cz/aoc_eblake.git/blob/1b122791d4:/2024/day22.m4
[4] https://adventofcode.com/2024/day/22

* NEWS: Document the bug fix. Also document recent compilation fixes.
* cfg.mk (indent_args): Teach indent not to mangle int casts.
* doc/m4.text (Eval): Add coverage for the bug fix.  Adjust one
error output that is now more precise.
* src/eval.c (logical_or_term, logical_and_term, or_term, xor_term)
(and_term, equality_term, cmp_term, shift_term, add_term, mult_term)
(exp_term, unary_term, simple_term): Delete, replaced by...
(primary, parse_expr): ...new functions.
(evaluate): Adjust caller.

(cherry picked from commit e3c4d07c56c0f85ef998067b59191c17c2188ab8)
2025-04-19 20:15:20 -05:00

3.1 KiB