83688 Commits

Author SHA1 Message Date
Karl Williamson
7e57931a2b perlio.c: Fix grammar in comment 2026-01-19 10:09:02 -07:00
Karl Williamson
06c43a5106 regen.pl: Add comment 2026-01-18 10:26:29 -07:00
Paul "LeoNerd" Evans
5789229d78 Rebuild embed.h last, in case other regen steps wanted to add more things to it
Otherwise, it's possible that later steps add new macros that `embed.pl`
would have wanted to add to its list of things to `#undef`, requiring a
second run of `make regen`.

By running embed.pl last, we ensure this happens in the right order the
first time.
2026-01-16 16:32:14 +00:00
Karl Williamson
fdbbcb462a regexec.c: Change return 0 to NULL
It is better practice to return a pointer when that's what the function
is declared to return
2026-01-15 18:28:15 -07:00
Karl Williamson
7e9ba10fa8 Document (and export) [IU]_(BITS|DIG)
Fixes #24083. Fixes #24084.

In d957e95daa0143d60933d96d6cbfb69eee6d6269 I changed the definitions of
IV_DIG and UV_DIG to depend on IV_BITS and UV_BITS respectively,
creating the latter in perl.h.  These had only visibility to the perl
core.  But I forgot that the _DIG forms were visible everywhere, so the
_BITS forms needed to be as well.

This commit merely documents all of them as public API (which should
have been the case all along anyway), which automatically causes their
visibility to be made everywhere.
2026-01-15 18:15:54 -07:00
Karl Williamson
3b349d42ed scan_num: Macroize common code
This avoids repeating code snippets.  It also changes things so adjacent
underscores are all absorbed at once (and warned about).  That means we
no longer have to keep track of if the previous character was an
underscore, so the variable that did that is removed.

Only two checks need be done for running off either end of the buffer.
The buffer is NUL-terminated, so if we see an underscore in the current
position, the next position exists (there is a NUL there if nothing
else); and the macro that looks behind one position is called in only
one place where we haven't always parsed beyond the first character.
2026-01-15 09:49:49 -07:00
Karl Williamson
62f47072c9 scan_num: Replace code by equivalent function call
The previous commit has made this function, long in numeric.c,
available to the rest of core.  The code removed here duplicated what it
does.  Two variables are now unused, and are removed.
2026-01-15 09:49:49 -07:00
Karl Williamson
c1becd7fcc numeric.c: Make S_output_nonportable() callable from core
Instead of being internal to this file, changing its name to
Perl_output_nonportable
2026-01-15 09:49:49 -07:00
Karl Williamson
8de9351fc4 toke.c: Convert do/while to modern STMT_START/END 2026-01-15 09:49:49 -07:00
Karl Williamson
6295a5898f scan_num: indentation, reorder-comments only
Some blocks have been removed, so can outdent; others will be added in
future commits, so indent.
2026-01-15 09:49:49 -07:00
Karl Williamson
d31aa9e5b2 scan_num: Convert switch() to elsif series
By using isDIGIT(), the cases for individual digits 1-9 collapse into a
single one, leaving just three possibilities, which are more clearly
handled by an if and two 'else if's
2026-01-15 09:49:49 -07:00
Karl Williamson
8c1024baa9 scan_num: Reorder case: statements in switch()
Move the default to the end, and the shortest to the beginning.  This is
in preparation for future commits.
2026-01-15 09:49:49 -07:00
Karl Williamson
2bdcfe0cb2 Add isDIGIT_or_UNDERSCORE() and use it
This uses the macro added in the previous commit to create this new
macro, and changes code in toke.c to use it.

toke.c is not hot code, but this demonstrates that the new scheme works,
and makes the code in toke.c a bit cleaner.
2026-01-15 08:47:33 -07:00
Karl Williamson
31c9996116 Add class for underscore character to l1_char_class_tab.h
l1_char_class_tab.h categorizes characters in the Latin1 range into
various classes, mostly into the POSIX classes like [:word:].  Each
character has a bit set corresponding to every class it is a member of.
These values are placed in a 256-element array and the ordinal value of
a character is used as an index into it for quick determination of if a
character is a member of a given class.

Besides the POSIX classes, there are some classes that make it more
convenient and/or faster for our code.  For example, there is a class
that allows us to quickly know if a given character is one that needs to
be preceded by a backslash by quotemeta().

This commit adds a class for the single character underscore '_', and a
macro that allows for seeing if a character is either an underscore or a
member of any other class, using a single conditional.

This means code that checks for if character X is either an underscore
or a member of class Y can change to eliminate one conditional.

Thus the reason to do this is efficiency.  Currently, the only places
that do this explicitly are in non-hot code.  But I have wip that has
hot code that could benefit from this.

The only downside of doing this is that it uses up one bit of the 32
available (without shenanigans) for such classes, leaving 4 spare.  But
before this release, the last time any new bit had been used up was
5.32, so the rate of using these spare up is quite low.

This bit could be reclaimed because the IDFIRST class in the Latin1
range is identical to ALPHA plus the underscore, so it could be
rewritten as that combination and its bit freed up.  However, this would
require adding some macros that take two class parameters instead of
one.  I briefly thought about doing that now, but since we have spare
bits and the rate of using them up is low, I didn't think it was worth
it at this time.

\w in this range is ALPHANUMERIC plus underscore.  But its use is more
embedded than IDFIRST is, so an attempt to reclaim its bit would require
more effort.
2026-01-15 08:47:33 -07:00
Karl Williamson
b116971bb0 regexec.c: Fix typo in comment 2026-01-12 12:57:42 -07:00
Karl Williamson
fc3b1a70c0 embed.fnc: Add string assertions for dump_exec_pos
This internal function takes a string argument with beginning and
ending positions.  It is called all the time with an empty string,
2026-01-12 12:49:49 -07:00
Karl Williamson
0a0ad01f4f embed.fnc: Add string assertions for debug_start_match
This internal function takes a string argument with beginning and
ending positions.  It handles the case of an empty string properly.
2026-01-12 12:49:49 -07:00
Karl Williamson
dfd074287b embed.fnc: Change EPTR get_quantifier_value assert to gt
This internal function looks problematic with regard to handling empty
strings, but it isn't ever called with one so far.  Change to catch such
calls that might get added in the future.
2026-01-12 12:49:49 -07:00
Karl Williamson
061aef76b4 embed.fnc: Change EPTR assert for regcurly to gt
This internal function can handle empty strings, but it isn't ever
called with one so far, and it is better practice to not call it with an
empty string
2026-01-12 12:49:49 -07:00
Karl Williamson
e68167c560 pp_pack.c: Add missing 'S_' to function names
Calls in this file to these functions bypassed the macros, with no harm
currently done.  But it isn't good practice.
2026-01-12 12:49:49 -07:00
Karl Williamson
2d5fdadd63 embed.fnc: Add string assertions for utf8_hop_forward...
These functions take a string argument with beginning and ending
positions.  They handle the case of an empty string properly, and the
documentation says they handle empty strings.
2026-01-12 12:49:49 -07:00
Karl Williamson
8ca6f48d1a embed.fnc: Add string assertions for grok_numeric_radix
This function takes a string with a beginning and ending pointer.  It
doesn't dereference if the string is empty, and returns the correct
value when empty, and does get called with empty strings.
2026-01-12 12:49:49 -07:00
Karl Williamson
280d4b1d30 embed.fnc: grok_bslash_[ox]: Use EPTR_gt
These two functions examine their input string without checking if it is
zero length.  So, the assertion needs to change.  They aren't ever
called with an empty string.
2026-01-12 12:49:49 -07:00
Karl Williamson
7941a78b40 embed.fnc: Add string asserts for first_symbol,need_utf8:
These can handle empty strings, and are called with them.
2026-01-12 12:49:49 -07:00
Karl Williamson
009870150b embed.fnc: Add string asserts for EPTR for variant_under_utf8_count
It can handle an empty string and is called with empty strings
2026-01-12 12:49:49 -07:00
Karl Williamson
27c12d7b8f embed.fnc: Add EPTR for pos_b2u_midway
This handles an empty string, and is called with that currently a lot
2026-01-12 12:49:49 -07:00
Karl Williamson
57a9a3d8ff embed.fnc: Add string assertions for isSCRIPTRUN
This function is documented to handle empty strings, so EPTRge is
appropriate.
2026-01-12 12:49:49 -07:00
Karl Williamson
46376544f2 embed.fnc: Update comments for macro visibility changes
92dcf59a90bfbb545599098d7043c96abb783ee5 changed newly-created macros to
be affected by the apidoc visibility flags.  This commit updates the
comments in embed.fnc to reflect that.
2026-01-12 10:13:54 -07:00
Karl Williamson
62d42586b1 embed.fnc: Generalize comments to apply beyond functions
This file was originally written to handle functions and the short name
macros that call them.

But its contents have gradually been expanded over the years, without
fully updating the comments to reflect that.

This commit rewords things so that text that refers to more than just
functions is generalized.
2026-01-12 10:13:54 -07:00
Karl Williamson
f6327fd950 embed.fnc: Clarify a couple comments 2026-01-12 10:13:54 -07:00
Karl Williamson
6bd056401c embed.fnc: Make important comment prominent
by moving it to the top of the file and making it SHOUT
2026-01-12 10:13:54 -07:00
Karl Williamson
6639e505e9 embed.fnc: Remove irrelevant comment
This warned about a 5.31 change.  That's long enough ago that people
have gotten used to the new scheme.
2026-01-12 10:13:54 -07:00
Karl Williamson
1001bfec0b embed.fnc: Remove obsolete comment
This hasn't been true since the automatic formatting/sorting of
embed.fnc was instituted in 5.38
2026-01-12 10:13:54 -07:00
Karl Williamson
8a46b02e76 embed.fnc: Remove obsolete comments
These haven't been true since autodoc.pl was changed in 5.38
2026-01-12 10:13:54 -07:00
Karl Williamson
8ae436df76 embed.fnc: Fix misspelling in comment 2026-01-12 10:13:54 -07:00
Karl Williamson
174ed799d4 perldelta for GH #23878 2026-01-12 10:13:09 -07:00
Karl Williamson
b5a8852627 perlapi: Add extensive strftime documentation
Due to the differences in various systems' implementations, I think it
is a good idea to more fully document the vagaries I have discovered,
and how perl resolves them.
2026-01-12 10:13:09 -07:00
Karl Williamson
ecba154a51 POSIX/t/time.t: Add tests for DST changes 2026-01-12 10:13:09 -07:00
Karl Williamson
0ff7fef53e POSIX/t/time.t: tzset() works on Windows; not MingW
I ran some experiments, and found that tzset works on Windows, and is
required after changing the TZ environment variable from within perl.

But it did not work on MingW.  Maybe there is something else needed in
the Posix module that would get it to work; I didn't investigate

The only way I could figure out how to distinguish in Perl space between
MSVC and MingW was looking at the make command.  Maybe there is a better
way
2026-01-12 10:13:09 -07:00
Karl Williamson
93d62c0706 POSIX/t/time.t: Extract common code to single place 2026-01-12 10:13:09 -07:00
Karl Williamson
f0096bdd93 POSIX.xs: Properly take daylight savings into account
Because of the bug fixed two commits ago, this function was changed in
5.42 to have a work around, which is no longer needed.
2026-01-12 10:13:09 -07:00
Karl Williamson
92a20ace82 my_strftime(): Properly take daylight savings into account
Because of the bug fixed in the previous commit, this function was
changed in 5.42 to have a work around, which is no longer needed.
2026-01-12 10:13:09 -07:00
Karl Williamson
cf36e33dd6 locale.c: ints_to_tm: Fix #if's
tl;dr:

Fixes GH #23878

I botched this in Perl 5.42.  These conditional compilation statements
were just plain wrong, causing code to be skipped that should have been
compiled.  It only affected the few hours of the year when daylight
savings time is removed, so that the hour value is repeated.  We didn't
have a good test for that.

gory details:

libc uses 'struct tm' to hold information about a given instant in
time, containing fields for things like the year, month, hour, etc.  The
libc function mktime() is used to normalize the structure, adjusting,
say, an input Nov 31 to be Dec 01.

One of the fields in the structure, 'is_dst', indicates if daylight
savings is in effect, or whether that fact is unknown.  If unknown,
mktime() is supposed to calculate the answer and to change 'is_dst'
accordingly.  Some implementations appear to always do this calculation
even when the input value says the result is known.  Others appear to
honor it.

Some libc implementations have extra fields in 'struct tm'.

Perl has a stripped down version of mktime(), called mini_mktime(),
written by Larry Wall a long time ago.  I don't know why.  This crippled
version ignores locale and daylight time.  It also doesn't know about
the extra fields in 'struct tm' that some implementations have.  Nor can
it be extended to know about those fields, as they are dependent on
timezone and daylight time, which it deliberately doesn't consider.

The botched #ifdef's were supposed to compensate for both the extra
fields in the struct and that some libc implementations always
recalculate 'is_dst'.

On systems with these fields, the botched #if's caused only
mini_mktime() to be called.  This meant that these extra fields didn't
get populated, and daylight time is never considered to be in effect.
And 'is_dst' does not get changed from the input.

On systems without these fields, the regular libc mktime() would be
called appropriately.

The bottom line is that for the portion of the year when daylight
savings is not in effect, that portion worked properly.  The two extra
fields would not be populated, so if some code were to read them, it
would only get the proper values by chance.  We got no reports of this.
I attribute that to the fact that the use of these is not portable, so
code wouldn't tend to use them.  There are portable ways to access the
information they contain.

Tests were failing for the portions of the year when daylight savings is
in effect; see GH #22351.  The code looked correct just reading it (not
seeing the flaw in the #ifdef's), so I assumed that it was an issue in
the libc implementations and instituted a workaround.  (I can't now
think of a platform where there hasn't been a problem with a libc with
something regarding locales, so that was a reasonable assumption.)

Among other things (fixed in the next commit), that workaround overrode
the 'is_dst' field after the call to mini_mktime(), so that the value
actually passed to libc strftime() indicated that daylight is in effect.

What happens next depends on the libc strftime() implementation.  It
could conceivably itself call mktime() which might choose to override
is_dst to be the correct value, and everything would always work.  The
more likely possibility is that it just takes the values in the struct
as-is.  Remember that those values on systems with the extra fields were
calculated as if daylight savings wasn't in effect, but now we're
telling strftime() to use those values as if it were in effect.  This
is a discrepancy.  I'd have to trace through some libc implementations
to understand why this discrepancy seems to not matter except at the
transition time.

But the bottom line is this commit removes that discrepancy, and causes
mktime() to be called appropriately on systems where it wasn't, so
strftime() should now function properly.
2026-01-12 10:13:09 -07:00
Karl Williamson
7f9d156010 locale.c: Slight comment clarification 2026-01-12 10:13:09 -07:00
Karl Williamson
a003be9e73 locale.c: Silence unused variable compiler warning
On some systems this was unused.  Now that we have C99, we can move the
declaration and some #ifdef's and not declare it unless it is going to
be used.
2026-01-12 10:13:09 -07:00
Karl Williamson
4122b31e04 posix.t: Properly populate optional fields to strtime
The next commits that fix some bugs showed these were not properly
getting initialized.
2026-01-12 10:13:09 -07:00
Karl Williamson
6dc6bb9e40 perlapi: Move apidoc lines to proper place
Where they were made autodoc.pl think they were from a new entry
2026-01-12 09:43:31 -07:00
Karl Williamson
6b6a828b0a Inline Perl_grok_(bin|oct|hex) 2026-01-12 09:42:39 -07:00
Karl Williamson
d957e95daa Add definitions for [IU}V_BITS to perl.h
replacing the pp.c definition of IV_BITS
2026-01-12 09:18:06 -07:00
Tony Cook
d469fd11be perldelta for B::COP::label, B::PVOP::pv utf8 fixes 2026-01-12 10:30:18 +11:00