Otherwise, it's possible that later steps add new macros that `embed.pl`
would have wanted to add to its list of things to `#undef`, requiring a
second run of `make regen`.
By running embed.pl last, we ensure this happens in the right order the
first time.
Fixes#24083. Fixes#24084.
In d957e95daa0143d60933d96d6cbfb69eee6d6269 I changed the definitions of
IV_DIG and UV_DIG to depend on IV_BITS and UV_BITS respectively,
creating the latter in perl.h. These had only visibility to the perl
core. But I forgot that the _DIG forms were visible everywhere, so the
_BITS forms needed to be as well.
This commit merely documents all of them as public API (which should
have been the case all along anyway), which automatically causes their
visibility to be made everywhere.
This avoids repeating code snippets. It also changes things so adjacent
underscores are all absorbed at once (and warned about). That means we
no longer have to keep track of if the previous character was an
underscore, so the variable that did that is removed.
Only two checks need be done for running off either end of the buffer.
The buffer is NUL-terminated, so if we see an underscore in the current
position, the next position exists (there is a NUL there if nothing
else); and the macro that looks behind one position is called in only
one place where we haven't always parsed beyond the first character.
The previous commit has made this function, long in numeric.c,
available to the rest of core. The code removed here duplicated what it
does. Two variables are now unused, and are removed.
By using isDIGIT(), the cases for individual digits 1-9 collapse into a
single one, leaving just three possibilities, which are more clearly
handled by an if and two 'else if's
This uses the macro added in the previous commit to create this new
macro, and changes code in toke.c to use it.
toke.c is not hot code, but this demonstrates that the new scheme works,
and makes the code in toke.c a bit cleaner.
l1_char_class_tab.h categorizes characters in the Latin1 range into
various classes, mostly into the POSIX classes like [:word:]. Each
character has a bit set corresponding to every class it is a member of.
These values are placed in a 256-element array and the ordinal value of
a character is used as an index into it for quick determination of if a
character is a member of a given class.
Besides the POSIX classes, there are some classes that make it more
convenient and/or faster for our code. For example, there is a class
that allows us to quickly know if a given character is one that needs to
be preceded by a backslash by quotemeta().
This commit adds a class for the single character underscore '_', and a
macro that allows for seeing if a character is either an underscore or a
member of any other class, using a single conditional.
This means code that checks for if character X is either an underscore
or a member of class Y can change to eliminate one conditional.
Thus the reason to do this is efficiency. Currently, the only places
that do this explicitly are in non-hot code. But I have wip that has
hot code that could benefit from this.
The only downside of doing this is that it uses up one bit of the 32
available (without shenanigans) for such classes, leaving 4 spare. But
before this release, the last time any new bit had been used up was
5.32, so the rate of using these spare up is quite low.
This bit could be reclaimed because the IDFIRST class in the Latin1
range is identical to ALPHA plus the underscore, so it could be
rewritten as that combination and its bit freed up. However, this would
require adding some macros that take two class parameters instead of
one. I briefly thought about doing that now, but since we have spare
bits and the rate of using them up is low, I didn't think it was worth
it at this time.
\w in this range is ALPHANUMERIC plus underscore. But its use is more
embedded than IDFIRST is, so an attempt to reclaim its bit would require
more effort.
This internal function looks problematic with regard to handling empty
strings, but it isn't ever called with one so far. Change to catch such
calls that might get added in the future.
This internal function can handle empty strings, but it isn't ever
called with one so far, and it is better practice to not call it with an
empty string
These functions take a string argument with beginning and ending
positions. They handle the case of an empty string properly, and the
documentation says they handle empty strings.
This function takes a string with a beginning and ending pointer. It
doesn't dereference if the string is empty, and returns the correct
value when empty, and does get called with empty strings.
These two functions examine their input string without checking if it is
zero length. So, the assertion needs to change. They aren't ever
called with an empty string.
92dcf59a90bfbb545599098d7043c96abb783ee5 changed newly-created macros to
be affected by the apidoc visibility flags. This commit updates the
comments in embed.fnc to reflect that.
This file was originally written to handle functions and the short name
macros that call them.
But its contents have gradually been expanded over the years, without
fully updating the comments to reflect that.
This commit rewords things so that text that refers to more than just
functions is generalized.
Due to the differences in various systems' implementations, I think it
is a good idea to more fully document the vagaries I have discovered,
and how perl resolves them.
I ran some experiments, and found that tzset works on Windows, and is
required after changing the TZ environment variable from within perl.
But it did not work on MingW. Maybe there is something else needed in
the Posix module that would get it to work; I didn't investigate
The only way I could figure out how to distinguish in Perl space between
MSVC and MingW was looking at the make command. Maybe there is a better
way
tl;dr:
Fixes GH #23878
I botched this in Perl 5.42. These conditional compilation statements
were just plain wrong, causing code to be skipped that should have been
compiled. It only affected the few hours of the year when daylight
savings time is removed, so that the hour value is repeated. We didn't
have a good test for that.
gory details:
libc uses 'struct tm' to hold information about a given instant in
time, containing fields for things like the year, month, hour, etc. The
libc function mktime() is used to normalize the structure, adjusting,
say, an input Nov 31 to be Dec 01.
One of the fields in the structure, 'is_dst', indicates if daylight
savings is in effect, or whether that fact is unknown. If unknown,
mktime() is supposed to calculate the answer and to change 'is_dst'
accordingly. Some implementations appear to always do this calculation
even when the input value says the result is known. Others appear to
honor it.
Some libc implementations have extra fields in 'struct tm'.
Perl has a stripped down version of mktime(), called mini_mktime(),
written by Larry Wall a long time ago. I don't know why. This crippled
version ignores locale and daylight time. It also doesn't know about
the extra fields in 'struct tm' that some implementations have. Nor can
it be extended to know about those fields, as they are dependent on
timezone and daylight time, which it deliberately doesn't consider.
The botched #ifdef's were supposed to compensate for both the extra
fields in the struct and that some libc implementations always
recalculate 'is_dst'.
On systems with these fields, the botched #if's caused only
mini_mktime() to be called. This meant that these extra fields didn't
get populated, and daylight time is never considered to be in effect.
And 'is_dst' does not get changed from the input.
On systems without these fields, the regular libc mktime() would be
called appropriately.
The bottom line is that for the portion of the year when daylight
savings is not in effect, that portion worked properly. The two extra
fields would not be populated, so if some code were to read them, it
would only get the proper values by chance. We got no reports of this.
I attribute that to the fact that the use of these is not portable, so
code wouldn't tend to use them. There are portable ways to access the
information they contain.
Tests were failing for the portions of the year when daylight savings is
in effect; see GH #22351. The code looked correct just reading it (not
seeing the flaw in the #ifdef's), so I assumed that it was an issue in
the libc implementations and instituted a workaround. (I can't now
think of a platform where there hasn't been a problem with a libc with
something regarding locales, so that was a reasonable assumption.)
Among other things (fixed in the next commit), that workaround overrode
the 'is_dst' field after the call to mini_mktime(), so that the value
actually passed to libc strftime() indicated that daylight is in effect.
What happens next depends on the libc strftime() implementation. It
could conceivably itself call mktime() which might choose to override
is_dst to be the correct value, and everything would always work. The
more likely possibility is that it just takes the values in the struct
as-is. Remember that those values on systems with the extra fields were
calculated as if daylight savings wasn't in effect, but now we're
telling strftime() to use those values as if it were in effect. This
is a discrepancy. I'd have to trace through some libc implementations
to understand why this discrepancy seems to not matter except at the
transition time.
But the bottom line is this commit removes that discrepancy, and causes
mktime() to be called appropriately on systems where it wasn't, so
strftime() should now function properly.