2272 Commits

Author SHA1 Message Date
Karl Williamson
d4247bb256 sbox32_hash.h: Add #undef's
These case statements need not be visible outside this header.  Putting
these here avoids cluttering up embed.h, where the same #undef lines
would otherwise be generated
2026-01-22 09:55:58 -07:00
Karl Williamson
4b67bbf7f6 embed.pl: Also consider #undef's
This code looks to see what conditions must apply before a #define
happens.  This commit extends that to also look for #undef commands.

The end result is that for symbols that are visible to XS code, but
aren't supposed to be, embed.h contains an #undef so it isn't visible.
But if it already has been #undef'ed, there is no need to do this.

But a symbol can be defined and undefined many times, and the conditions
for doing an #undef may be different than what the symbol was #defined
under.

The consequences of not realizing that a symbol gets undefined are
simply that we generate an unnecessary #undef.  The consequences of
failing to generate one when the symbol is defined is that it is
visibile when not intended to be so.

So, there are various restrictions to try to make sure that we don't
err in the latter direction.
2026-01-22 09:55:58 -07:00
Karl Williamson
46bf0cf5e8 embed.pl: Consider symbols visible only to extensions
Two commits ago, the code was extended to compare the C preprocessor
visibility of a symbol with what the desired visibility of a symbol is.
It assumed that everything not constrained to core was visible
everywhere.  This commit extends that to look for being visible only to
extensions.  As a result, the large number of symbols added to the
override list in that commit are now removed.
2026-01-22 09:55:58 -07:00
Karl Williamson
6731bcd053 embed.pl: Compare cpp visibility with desired
This commit creates a function that calculates what C preprocessor
constraints there are on the visibility of a #define'd symbol.

It then compares that with what the desired visibility is, based as
prior commits have determined, and reconciles any discrepancies.  It
warns if the symbol is supposed to be visible, but cpp makes it not so.
It adds it to the list of symbols to undefine if it is visible, but is
not supposed to be so.

In order to make this commit somewhat smaller with respect to code
changes, it assumes anything that is visible to extensions is visible
everywhere.  This entailed adding a large number of symbols to the list
of symbols to not #undef, in order to not change embed.h.  The commit
after the next one will fix this, and those symbols will be removed from
the list in that commit.
2026-01-22 09:55:58 -07:00
Karl Williamson
1fd7f0165b embed.pl: Handle case of multiple flags for an element
Some symbols are #defined in multiple places in the input; based
typically on different preprocessor conditionals.  We want to use the
definition which has the widest visibility.

This change showed that USE_STDIO had wrongly been undefined for the
past few commits in blead.  It is no longer actually ever defined by
perl.
2026-01-22 09:55:58 -07:00
Karl Williamson
cea46623dd op.c: Don't hand-roll is_dup_mode() attributes
This commit adds an entry in embed.fnc for S_is_dup_mode, removing the
assert and __attribute lines in op.c.

The proximal cause for this commit is that I tried compiling with
Perl_assert() enabled, which resulted in a lot of compiler warnings
because of the __attribute__non_null__ line

But I also think it is better to not hand-roll things unless absolutely
necessary.  Changes someone makes to the general scheme are not likely
to be propagated to the hand-rolled items.
2026-01-22 06:58:57 -07:00
Tony Cook
e11f0edb1a sv_numeq etc: don't do numify overloading with SV_SKIP_OVERLOAD 2026-01-22 13:09:13 +11:00
Tony Cook
747670eba8 add sv_numle(), sv_numlt(), sv_numge(), sv_numgt() APIs
These are all needed because overloading may make them inconsistent
with <=> overloading.
2026-01-22 13:07:35 +11:00
Tony Cook
01e16d6bee add sv_numcmp() to the API 2026-01-22 13:07:34 +11:00
Tony Cook
22223d7d51 sv.c: extract the common parts of sv_numeq_flags and sv_numne_flags 2026-01-22 13:07:34 +11:00
Tony Cook
53bf030389 add sv_numne() to the API
some refactoring next, since sv_numeq_flags and sv_numne_flags are
similar.

Used a separate test file since putting every sv_num*() variant in the
one file would be ugly

Addresses GH #23918 but isn't a direct fix
2026-01-22 13:07:34 +11:00
Paul "LeoNerd" Evans
b5e0af402b Unify mg_free_struct() between mg.c and sv.c
Rather than copy-pasted code in two places, define a helper function to
call from both. This also accounts for minor (but currently
inconsequential) differences in behaviour between the two copy-pasted
locations, that seem to have drifted apart over time.
2026-01-21 23:49:41 +00:00
Karl Williamson
7e9ba10fa8 Document (and export) [IU]_(BITS|DIG)
Fixes #24083. Fixes #24084.

In d957e95daa0143d60933d96d6cbfb69eee6d6269 I changed the definitions of
IV_DIG and UV_DIG to depend on IV_BITS and UV_BITS respectively,
creating the latter in perl.h.  These had only visibility to the perl
core.  But I forgot that the _DIG forms were visible everywhere, so the
_BITS forms needed to be as well.

This commit merely documents all of them as public API (which should
have been the case all along anyway), which automatically causes their
visibility to be made everywhere.
2026-01-15 18:15:54 -07:00
Karl Williamson
c1becd7fcc numeric.c: Make S_output_nonportable() callable from core
Instead of being internal to this file, changing its name to
Perl_output_nonportable
2026-01-15 09:49:49 -07:00
Karl Williamson
31c9996116 Add class for underscore character to l1_char_class_tab.h
l1_char_class_tab.h categorizes characters in the Latin1 range into
various classes, mostly into the POSIX classes like [:word:].  Each
character has a bit set corresponding to every class it is a member of.
These values are placed in a 256-element array and the ordinal value of
a character is used as an index into it for quick determination of if a
character is a member of a given class.

Besides the POSIX classes, there are some classes that make it more
convenient and/or faster for our code.  For example, there is a class
that allows us to quickly know if a given character is one that needs to
be preceded by a backslash by quotemeta().

This commit adds a class for the single character underscore '_', and a
macro that allows for seeing if a character is either an underscore or a
member of any other class, using a single conditional.

This means code that checks for if character X is either an underscore
or a member of class Y can change to eliminate one conditional.

Thus the reason to do this is efficiency.  Currently, the only places
that do this explicitly are in non-hot code.  But I have wip that has
hot code that could benefit from this.

The only downside of doing this is that it uses up one bit of the 32
available (without shenanigans) for such classes, leaving 4 spare.  But
before this release, the last time any new bit had been used up was
5.32, so the rate of using these spare up is quite low.

This bit could be reclaimed because the IDFIRST class in the Latin1
range is identical to ALPHA plus the underscore, so it could be
rewritten as that combination and its bit freed up.  However, this would
require adding some macros that take two class parameters instead of
one.  I briefly thought about doing that now, but since we have spare
bits and the rate of using them up is low, I didn't think it was worth
it at this time.

\w in this range is ALPHANUMERIC plus underscore.  But its use is more
embedded than IDFIRST is, so an attempt to reclaim its bit would require
more effort.
2026-01-15 08:47:33 -07:00
Karl Williamson
6b6a828b0a Inline Perl_grok_(bin|oct|hex) 2026-01-12 09:42:39 -07:00
Karl Williamson
d957e95daa Add definitions for [IU}V_BITS to perl.h
replacing the pp.c definition of IV_BITS
2026-01-12 09:18:06 -07:00
Karl Williamson
52a45f6eea perlapio: USE_STDIO is no longer an option
Since 5.16.0.
2026-01-09 12:50:24 -07:00
Karl Williamson
86cc3ae1e0 regen/embed.pl: Update copyright to 2026 2026-01-05 13:30:18 -07:00
Karl Williamson
24c7fb4c21 Convert Perl utf16 to utf8 functions to macros
These functions are hereby removed in favor of calling the plain macros
that already exist
2025-12-27 21:24:47 -07:00
Karl Williamson
7e1ae0c850 Remove SBOX case statements from external visibility
I'm pretty sure there is no use case for these, and very unlikely to
have any actual uses.
2025-12-10 08:50:19 -07:00
Karl Williamson
ebbe6ac0f7 Remove a few more macros from being visible to XS code
These are a few macros dealing with inversion lists that were never
intended to be visible to general XS code, and they actually can't be in
use in cpan because the mechanisms to create inversion lists are private
to perl.
2025-12-10 08:50:19 -07:00
Karl Williamson
92dcf59a90 Gain control of macro namespace visibility
This commit adds the capability to undefine macros that are visible to
XS code but shouldn't be.  This can be used to stop macro namespace
pollution by perl.

It works by changing embed.h to have two modes, controlled by a #ifdef
that is set by perl.h.  perl.h now #includes embed.h twice.  The first
time works as it always has.  The second sets the #ifdef, and causes
embed.h to #undef the macros that shouldn't be visible.  This call is
just before perl.h returns to its includer, so that these macros have
come and gone before the file that #included perl.h is affected by them.
It comes after the inline headers get included, so they have access to
all the symbols that are defined.

The list of macros is determined by the visibility given by the apidoc
lines documenting them, plus several exception lists that allow a symbol
to be visible even though it is not documented as such.

In this commit, the main exception list contains everything that is
currently visible outside the Perl core, so this should not break any
code.  But it means that the visibility control is established for
future changes to our code base.  New macros will not be visible except
when documented as needing to be such.  We can no longer inadvertently
add new names to pollute the user's.

I expect that over time, the exception list will become smaller, as we
go through it and remove the items that really shouldn't be visible.  We
can then see via smoking if someone is actually using them, and either
decide that these should be visible, or work with the module author for
another way to accomplish their needs.  (I would hope this would lead to
proper documentation of the ones that need to be visible.)

There are currently four lists of symbols.

One list is for symbols that are used by libc functions, and that Perl
may redefine (usually so that code doesn't have to know if it is running
on a platform that is lacking the given feature.)  The algorithm added
here catches most of these and keeps them visible, but there are a few
items that currently must be manually listed.

A second list is of symbols that the re extension to Perl requires, but
no one else needs to.  This list is currently empty, as everything
initially is in the main exception list.

A third list is for items that other Perl extensions require, but no one
else needs to.  This list is currently empty, as everything initially is
in the main exception list.

The final list is for items that currently are visible to the whole
world.  It contains thousands of items.  This list should be examined
for:

    1) Names that shouldn't be so visible; and
    2) Names that need to remain visible but should be changed so they
       are less likely to clash with anything the user might come up
       with.

I have wanted this ability to happen for a long time; and now things
have come together to enable it.

This allows us to have a clear-cut boundary with CPAN.

It means you can add macros that have internal-only use without having
to worry about making them likely not to clash with user names.

It shows precisely in one place what our names are that are visible to
CPAN.
2025-12-10 08:50:19 -07:00
Karl Williamson
838b774823 Move hv_stores() declaration from embed.fnc to hv.h
This is required for the next few commits that start automatically
creating long Perl_name functions for the elements in embed.fnc that are
macros and don't already have them in the source.

Only macros can take a parameter that has to be a literal string, so
don't fit with the next few commits.  This is the only case in embed.fnc
like that, so I'm deferring dealing with it for now.
2025-12-10 08:50:19 -07:00
Karl Williamson
32aaa22eec embed.fnc: Drop Perl_ on do_aexec my_stat my_lstat
These macros are not for external use, so don't need a Perl_ prefix
2025-12-10 08:50:19 -07:00
Karl Williamson
4092daf53e Remove some special EBCDIC code
The 'variant_byte_number' function was written to find the byte number
in a word of the first byte whose meaning varies depending on if the
string it is part of is encoded in UTF-8 or not.  On ASCII machines,
that is simply when the upper bit is set.  On EBCDIC machines, there is
no similar pattern, so this function hasn't been compiled on those.

A long time ago, I realized that this function could also handle binary
data by coercing that binary data into having the form of having that
bit set or not depending on the pattern being looked for, and then
calling that function.

But I actually hadn't realized until now that it was binary data not
tied to a character set that was being worked on.  This commit rectifies
that.  A new alias is added for that function that emphasizes that it
works on binary data, the function is now compiled for EBCDIC, and the
EBCDIC-only code that avoided using it is now removed.
2025-11-01 21:02:37 -06:00
Paul "LeoNerd" Evans
f1a8d7d883 Implement named parameters in signatures (PPC0024)
This adds a major new ability to subroutine signatures, allowing callers
to pass parameters by name/value pairs rather than by position.

  sub f ($x, $y, :$alpha, :$beta = undef) { ... }

  f( 123, 456, alpha => 789 );

Originally specified in

  https://github.com/Perl/PPCs/blob/main/ppcs/ppc0024-signature-named-parameters.md

This feature is currently considered experimental.
2025-10-31 11:31:29 +00:00
Branislav Zahradník
147d5f1b9e [parser] new_block_statement - deduplicate "a block is a loop that happens once" 2025-10-22 17:23:56 +01:00
Branislav Zahradník
e6a443b294 [parser] package - deduplicate coupled call sequence
Function combines call of original `package` and `package_version` when
new namespace statement is detected.

Instead of required three statements usage now consists of single function call.
2025-10-22 17:23:56 +01:00
Karl Williamson
935cdb76e8 embed.fnc: mv definition of more_sv
This was in a #ifdef of being in sv.c, which it is, but since it is
public, it needs to be moved out of this.  This removes the need for a
copy of its prototype to be in sv_inline.h
2025-10-21 18:58:48 -06:00
Karl Williamson
2e142e0d27 regen/embed.pl: Avoid use of hard-coded list
The list consists of exactly the functions that have the O flag set in
embed.fnc.  No need to keep this data twice.  The entries are trivially
generatable from existing entries as we go along

And those generated entries have the added advantage of not using the
short name, so potentially less name space pollution
2025-10-21 18:58:48 -06:00
Karl Williamson
bd4c0d1fc2 Add S_parse_ident_no_copy()
This new function is for callers that are merely checking if the string
being parsed is a legal identifier or not, and arent interested in the
normalized version of the identifier that parse_indent() generates.

This new function allows callers to not have to think about this buffer;
it just wraps plain parse_ident() using a throw-away buffer to hold the
returned normalized text.  This avoids introducing a bunch of
conditionals inside parse_ident.
2025-10-17 12:26:04 -06:00
Karl Williamson
735e7cc211 toke.c: Change parse_ident to take any string
Prior to this commit, the string passed to this function had to be
pointing to somewhere in PL_bufptr.  But this is only because it assumed
that the initial position is less than PL_bufend.  By passing the upper
bound in, that assumption is automatically removed.
2025-10-17 12:26:00 -06:00
Karl Williamson
e4be402477 toke.c: Use flags parameter to S_parse_ident
This makes it clearer at each call point what is happening, and prepares
for future commits where more flags will be passed to this function.
2025-10-17 12:26:00 -06:00
Karl Williamson
bfbd5f7e35 toke.c: Use flags parameter for S_force_word
This makes it clear at each call point what is happening, instead of
having to jump to the S_force_word definition to know what 'false, true'
vs 'true, false' actually means.

And this prepares for future commits.
2025-10-17 12:25:59 -06:00
Karl Williamson
3450d19250 intuit_more: 'use strict' allows much better handling
Most code these days runs under 'use strict'.  That allows us to resolve
ambiguity without resorting to heuristics in far more cases than before.

This commit adds a parameter to intuit_more() that gives the context it
is being called from.  And when that call is to resolve what $foo[...]
is supposed to mean, we can look up foo to see if it is an array or a
scalar.  If the former, the "..." must be a subscript; if a scalar, it
must be a charclass.

Only if there is both a $foo and an @foo is there ambiguity.  If so, we
drop down to using the heuristics
2025-10-17 12:09:03 -06:00
Karl Williamson
aa93969e9c toke.c: Create function to see if an identifier is known
This checks first if there is a lexical variable in scope with the given
name, and if not, if there is a global
2025-10-17 12:09:03 -06:00
Karl Williamson
9fc9ec2818 Change invlist function names to be legal
This continues the process started in #23592 to change names with
leading underscores to be legal C.  See that p.r. or
4bb3572f7a1c1f3944b7f58b22b6e7a9ef5faba6 for extensive discussion.

This commit simply moves the leading underscore to be trailing
2025-10-12 16:56:21 -06:00
Karl Williamson
59bca40fd0 S_scan_ident: Convert parameter to bool
All calls to it set it to TRUE or FALSE
2025-10-07 11:48:47 -06:00
Paul "LeoNerd" Evans
215e36f380 Add cop_*_warning() API
This adds three new API functions: a pair to modify a COP by enabling or
disabling a single warning bit within it, and a query function to ask if
a given warning is already enabled.

This API is provided for CPAN modules to use to modify the set of
warnings present in a COP during compile-time. Currently modules need to
use the `new_warnings_bitfield()` function, which was recently hidden by
09a0707. That change broke the `Syntax::Keyword::Try` module, as
reported in https://github.com/Perl/perl5/issues/23609.
2025-09-23 13:43:47 +01:00
Karl Williamson
c14d142701 Make die() always expand to Perl_die_nocontext()
See 03f24b8a082948e5b437394fa33d0af08d7b80b6 for the motivation.

This commit changes plain die() to not use a thread context parameter.
It and die_nocontext() now behave identically.
2025-09-21 06:55:45 -06:00
Karl Williamson
2cb0034ef5 Unroll valid_utf8_to_uv loop
This gives a bit of performance boost in this function that can be
called during pattern matching.

Here are some cachegrind comparisons with blead:

Key:
    Ir   Instruction read
    Dr   Data read
    Dw   Data write
    COND conditional branches
    IND  indirect branches

The numbers represent relative counts per loop iteration, compared to
blead at 100.0%.
Higher is better: for example, using half as many instructions gives 200%,
while using twice as many gives 50%.

               GCC                     CLANG

valid_utf8_to_uv(0x007f), length is 1

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00      100.69        Ir 100.00  99.11
    Dr 100.00      101.47        Dr 100.00  99.74
    Dw 100.00      100.00        Dw 100.00  99.57
  COND 100.00      101.20        COND 100.00 100.00
   IND 100.00      100.00        IND 100.00  94.12

valid_utf8_to_uv(0x07ff), length is 2

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00      100.68        Ir 100.00  99.04
    Dr 100.00      101.47        Dr 100.00  99.74
    Dw 100.00      100.00        Dw 100.00  99.57
  COND 100.00      102.40        COND 100.00 101.23
   IND 100.00      100.00        IND 100.00  94.12

valid_utf8_to_uv(0xfffd), length is 3

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00      100.83        Ir 100.00  99.04
    Dr 100.00      101.47        Dr 100.00  99.75
    Dw 100.00      100.00        Dw 100.00  99.57
  COND 100.00      102.99        COND 100.00 101.84
   IND 100.00      100.00        IND 100.00  94.12

valid_utf8_to_uv(0xffffd), length is 4

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00      100.91        Ir 100.00  99.13
    Dr 100.00      101.46        Dr 100.00  99.75
    Dw 100.00      100.00        Dw 100.00  99.57
  COND 100.00      103.59        COND 100.00 102.45
   IND 100.00      100.00        IND 100.00  94.12

valid_utf8_to_uv(0x3ffffff), length is 5

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00      101.28        Ir 100.00  99.29
    Dr 100.00      101.46        Dr 100.00  99.75
    Dw 100.00      100.00        Dw 100.00  99.57
  COND 100.00      104.19        COND 100.00 103.07
   IND 100.00      100.00        IND 100.00  94.12

valid_utf8_to_uv(0x7fffffff), length is 6

        blead      hacked        blead      hacked
       ------ -----------        ------     ------
    Ir 100.00       89.83        Ir 100.00  88.83
    Dr 100.00       95.22        Dr 100.00  92.94
    Dw 100.00       92.44        Dw 100.00  91.63
  COND 100.00       86.21        COND 100.00  87.11
   IND 100.00      100.00        IND 100.00  88.89

Clang gives slightly worse results than gcc.  But there is an
improvement in both cases for conditionals for two-byte and longer
characters..

This shows that the performance is significantly worse for code points
that take 6 bytes (or more, which I didn't include) to represent.  These
are all well outside the Unicode range; hence are very rarely
encountered.  Performance is improved a bit for the typical cases.

The algorithm used could handle 6 and 7 byte characters, but that
increases memory usage, and can lead to the compiler choosing to not
inline this function.  In blead, experiments with clang gave these
results
    Max bytes inlined   Instances in the code where not inlined
        3                 14
        4                 19
        5                 19
        6                 19
        7                 57

We really need to accomodate any Unicode code point, which is 4 bytes (5
on EBCDIC).  But the others we don't care about.  Even though 6 bytes
doesn't show as being worse than 4, I chose to not include it, because
we don't care about performance for these rare non-Unicode code points,
and it just might cause non-inlining for different compilers or clang
versions.
2025-09-20 10:21:33 -06:00
Karl Williamson
03f24b8a08 Make croak() always expand to Perl_croak_nocontext()
Perl almost always opts for saving time over saving space.  Hence, we
have croak() that saves time at the expense of space, but needs thread
context available; and croak_no_context() that doesn't need that, but
takes extra time

But, when we are about to die, time isn't that important.  Even if we
are doing eval after eval in a tight loop, the potential time savings of
passing the thread context to Perl_croak is insignificant compared to
the tear-down that follows.  My claim then is that croak() never needed
a thread context parameter to save a bit of time just before death.  It
is an optimization that isn't worth it.  And having it do so required
the invention of croak_nocontext(), and the extra cognitive load
associated with two methods for the same task.

This commit changes plain croak() to not use a thread context parameter.
It and croak_nocontext() now behave identically.  That means that going
forward, people will likely choose croak() which requires less typing
and occupies fewer columns on the screen, and they won't have to
remember which form to use when.
2025-09-12 14:47:53 -06:00
Karl Williamson
8444d54d4b Move prototype definition of SvPV_helper to embed.fnc
It's usually a bad idea to try to work around a limitation in common
code by copy-pasting and then modifiying to taste.  Fixes/improvements
to the common code rarely get propagated to the outlier.

I wrote code in 1ef9039bccb that did just this for the prototype
definition of SvPV_helper, because the place where it really belongs,
embed.fnc, couldn't (and still doesn't) handle function pointers as
arguments (patches welcome).

I should have at least added a comment to the common code noting the
existence of this outlier.

It turns out that that limitation can be worked around by declaring a
typedef of the pointer, and then using that in embed.fnc.

That's what this commit does.

This commit removes the final instance of duplicating the work of
embed.fnc in the core, except for some in the regex system whose
comments say the reason is to avoid making a typedef public.  I haven't
investigated these further.
2025-09-01 10:50:08 -06:00
Karl Williamson
d8012228a9 Convert _is_utf8_FOO to legal name 2025-09-01 08:12:24 -06:00
Karl Williamson
8de60a95d1 Convert _is_uni_FOO to legal name 2025-09-01 08:12:23 -06:00
Karl Williamson
8b91a7e5f4 Convert _is_utf8_perl_idcont to legal name 2025-09-01 08:12:23 -06:00
Karl Williamson
ffc38ee761 Convert _is_uni_perl_idcont to legal name 2025-09-01 08:12:22 -06:00
Karl Williamson
9f11f6a038 Convert _is_utf8_perl_idstart to legal name 2025-09-01 08:12:21 -06:00
Karl Williamson
eb3ee9300b Convert _is_uni_perl_idstart to legal name 2025-09-01 08:12:21 -06:00