A bad actor could fill only a few entries in a table (power of twos in
decreasing order, see tests) and produce a small table with a huge
length. If your program builds a table with external data and iterates
over its length, this behavior could be an issue.
That complicates a little object equality (and therefore table access
for long strings), but the old behavior was somewhat weird. (Short
strings, a concept otherwise absent from the manual, could not be
external.)
Do not optimize only for table updates (key already present).
Creation of new short keys in new tables can be quite common in
programs that create lots of small tables, for instance with
constructors like {x=e1,y=e2}.
Moreover, each function being parsed has its own table.
The code is cleaner when each table is used for one specific purpose:
The scanner uses its table to anchor and unify strings, mapping strings
to themselves; the parser uses it to reuse constants in the code,
mapping constants to their indices in the constant table. A different
table for each task avoids false collisions.
When reinserting elements into a table during a rehash, the code does
not need to invoke all the complexity of a full 'luaH_set':
- The table has space for all keys.
- The key cannot exist in the new hash.
- The keys are valid (not NaN nor nil).
- The keys are normalized (1.0 -> 1).
- The values cannot be nil.
- No barrier needed (the table already pointed to the key and value).
Instead of using 'alimit' for keeping the size of the array and at
the same time being a hint for '#t', a table now keeps these two
values separate. The Table structure has a field 'asize' with the
size of the array, while the length hint is kept in the array itself.
That way, tables with no array part waste no space with that field.
Moreover, the space for the hint may have zero cost for small arrays,
if the array of tags plus the hint still fits in a single word.
Without this extra space, sequences of insertions/deletions (and
some other uses) can have unpexpected low performances. See the
added tests for an example, and *Mathematical Models to Analyze Lua
Hybrid Tables and Why They Need a Fix* (Martínez, Nicaud, Rotondo;
arXiv:2208.13602v2) for detais.
Array part needs 1/3 of its elements filled, instead of 1/2.
Array entries use ~1/3 the memory of hash entries, so this new rule
still ensures that array parts do not use more memory than keeping
the values in the hash, while allowing more uses of the array part,
which is more efficient than the hash.
If there are no integer keys outside the array part, there is no
reason to resize it, saving the time to count its elements. Moreover,
assignments to non-integer keys will not collapse a table created with
'table.create'.
Undoing previous commit. Returning TValue increases code size without
any visible gains. Returning the tag is a little simpler than returning
a special code (HOK/HNOTFOUND) and the tag is useful by itself in
some cases.
Instead of receiving a parameter telling them where to put the result
of the query, these functions return the TValue directly. (That is,
they return a structure.)
Avoid silent conversions from int to unsigned int when calling
'luaH_resize'; avoid silent conversions from lua_Integer to int in
'table.create'; MAXASIZE corrected for the new implementation of arrays;
'luaH_resize' checks explicitly whether new size respects MAXASIZE.
(Even constructors were bypassing that check.)
It is quite common to write to empty but existing cells in the array
part of a table, so 'luaH_psetint' checks for the common case that
the table doesn't have a newindex metamethod to complete the write.
'lua_rawgeti' now uses "fast track"; 'lua_rawseti' still calls
'luaH_setint', but the latter was recoded to avoid extra overhead
when writing to the array part after 'alimit'.
The array part of a table wastes too much space, due to padding.
To avoid that, we need to store values in the array as something
different from a TValue. Therefore, the API for table access
should not assume that any value in a table lives in a *TValue.
This commit is the first step to remove that assumption: functions
luaH_get*, instead of returning a *TValue where the value lives,
receive a *TValue where to put the value being accessed.
(We still have to change the luaH_set* functions.)