The parser uses "break" as fake label to compile "break" as "goto
break". To avoid producing this string at each use, it keeps it
available in its state.
Moreover, each function being parsed has its own table.
The code is cleaner when each table is used for one specific purpose:
The scanner uses its table to anchor and unify strings, mapping strings
to themselves; the parser uses it to reuse constants in the code,
mapping constants to their indices in the constant table. A different
table for each task avoids false collisions.
Several definitions that don't need to be "global" (that is, that
concerns only specific parts of the code) moved out of llimits.h,
to more appropriate places.
Instead of an explicit value (field 'b'), true and false use different
tag variants. This avoids reading an extra field and results in more
direct code. (Most code that uses booleans needs to distinguish between
true and false anyway.)
- 'L' added to the 'BuffFS' structure
- '%c' does not handle control characters (it is not its business.
This now is done by the lexer, who is the one in charge of that
kind of errors.)
- avoid the direct use of 'l_sprintf' in the Lua kernel
- In 'readutf8esc' (llex.c), the overflow check must be done before
shifting the accumulator. It was working because tests were using
64-bit longs. Failed with 32-bit longs.
- In OP_FORPREP (lvm.c), avoid negating an unsigned value. Visual
Studio gives a warning for that operation, despite being well
defined in ISO C.
- In 'luaV_execute' (lvm.c), 'cond' can be defined only when needed,
like all other variables.
All UTF-8 encoding functionality (including the escape
sequence '\u') accepts all values from the original UTF-8
specification (with sequences of up to six bytes).
By default, the decoding functions in the UTF-8 library do not
accept invalid Unicode code points, such as surrogates. A new
parameter 'nonstrict' makes them accept all code points up to
(2^31)-1, as in the original UTF-8 specification.
A long bracket with too many equal signs can overflow the 'int' used for
the counting and some arithmetic done on the value. Changing the counter
to 'size_t' avoids that. (Because what is counted goes to a buffer, an
overflow in the counter will first raise a buffer-overflow error.)