These case statements need not be visible outside this header. Putting
these here avoids cluttering up embed.h, where the same #undef lines
would otherwise be generated
This code looks to see what conditions must apply before a #define
happens. This commit extends that to also look for #undef commands.
The end result is that for symbols that are visible to XS code, but
aren't supposed to be, embed.h contains an #undef so it isn't visible.
But if it already has been #undef'ed, there is no need to do this.
But a symbol can be defined and undefined many times, and the conditions
for doing an #undef may be different than what the symbol was #defined
under.
The consequences of not realizing that a symbol gets undefined are
simply that we generate an unnecessary #undef. The consequences of
failing to generate one when the symbol is defined is that it is
visibile when not intended to be so.
So, there are various restrictions to try to make sure that we don't
err in the latter direction.
Two commits ago, the code was extended to compare the C preprocessor
visibility of a symbol with what the desired visibility of a symbol is.
It assumed that everything not constrained to core was visible
everywhere. This commit extends that to look for being visible only to
extensions. As a result, the large number of symbols added to the
override list in that commit are now removed.
This commit creates a function that calculates what C preprocessor
constraints there are on the visibility of a #define'd symbol.
It then compares that with what the desired visibility is, based as
prior commits have determined, and reconciles any discrepancies. It
warns if the symbol is supposed to be visible, but cpp makes it not so.
It adds it to the list of symbols to undefine if it is visible, but is
not supposed to be so.
In order to make this commit somewhat smaller with respect to code
changes, it assumes anything that is visible to extensions is visible
everywhere. This entailed adding a large number of symbols to the list
of symbols to not #undef, in order to not change embed.h. The commit
after the next one will fix this, and those symbols will be removed from
the list in that commit.
Some symbols are #defined in multiple places in the input; based
typically on different preprocessor conditionals. We want to use the
definition which has the widest visibility.
This change showed that USE_STDIO had wrongly been undefined for the
past few commits in blead. It is no longer actually ever defined by
perl.
This commit adds an entry in embed.fnc for S_is_dup_mode, removing the
assert and __attribute lines in op.c.
The proximal cause for this commit is that I tried compiling with
Perl_assert() enabled, which resulted in a lot of compiler warnings
because of the __attribute__non_null__ line
But I also think it is better to not hand-roll things unless absolutely
necessary. Changes someone makes to the general scheme are not likely
to be propagated to the hand-rolled items.
some refactoring next, since sv_numeq_flags and sv_numne_flags are
similar.
Used a separate test file since putting every sv_num*() variant in the
one file would be ugly
Addresses GH #23918 but isn't a direct fix
Rather than copy-pasted code in two places, define a helper function to
call from both. This also accounts for minor (but currently
inconsequential) differences in behaviour between the two copy-pasted
locations, that seem to have drifted apart over time.
Fixes#24083. Fixes#24084.
In d957e95daa0143d60933d96d6cbfb69eee6d6269 I changed the definitions of
IV_DIG and UV_DIG to depend on IV_BITS and UV_BITS respectively,
creating the latter in perl.h. These had only visibility to the perl
core. But I forgot that the _DIG forms were visible everywhere, so the
_BITS forms needed to be as well.
This commit merely documents all of them as public API (which should
have been the case all along anyway), which automatically causes their
visibility to be made everywhere.
l1_char_class_tab.h categorizes characters in the Latin1 range into
various classes, mostly into the POSIX classes like [:word:]. Each
character has a bit set corresponding to every class it is a member of.
These values are placed in a 256-element array and the ordinal value of
a character is used as an index into it for quick determination of if a
character is a member of a given class.
Besides the POSIX classes, there are some classes that make it more
convenient and/or faster for our code. For example, there is a class
that allows us to quickly know if a given character is one that needs to
be preceded by a backslash by quotemeta().
This commit adds a class for the single character underscore '_', and a
macro that allows for seeing if a character is either an underscore or a
member of any other class, using a single conditional.
This means code that checks for if character X is either an underscore
or a member of class Y can change to eliminate one conditional.
Thus the reason to do this is efficiency. Currently, the only places
that do this explicitly are in non-hot code. But I have wip that has
hot code that could benefit from this.
The only downside of doing this is that it uses up one bit of the 32
available (without shenanigans) for such classes, leaving 4 spare. But
before this release, the last time any new bit had been used up was
5.32, so the rate of using these spare up is quite low.
This bit could be reclaimed because the IDFIRST class in the Latin1
range is identical to ALPHA plus the underscore, so it could be
rewritten as that combination and its bit freed up. However, this would
require adding some macros that take two class parameters instead of
one. I briefly thought about doing that now, but since we have spare
bits and the rate of using them up is low, I didn't think it was worth
it at this time.
\w in this range is ALPHANUMERIC plus underscore. But its use is more
embedded than IDFIRST is, so an attempt to reclaim its bit would require
more effort.
These are a few macros dealing with inversion lists that were never
intended to be visible to general XS code, and they actually can't be in
use in cpan because the mechanisms to create inversion lists are private
to perl.
This commit adds the capability to undefine macros that are visible to
XS code but shouldn't be. This can be used to stop macro namespace
pollution by perl.
It works by changing embed.h to have two modes, controlled by a #ifdef
that is set by perl.h. perl.h now #includes embed.h twice. The first
time works as it always has. The second sets the #ifdef, and causes
embed.h to #undef the macros that shouldn't be visible. This call is
just before perl.h returns to its includer, so that these macros have
come and gone before the file that #included perl.h is affected by them.
It comes after the inline headers get included, so they have access to
all the symbols that are defined.
The list of macros is determined by the visibility given by the apidoc
lines documenting them, plus several exception lists that allow a symbol
to be visible even though it is not documented as such.
In this commit, the main exception list contains everything that is
currently visible outside the Perl core, so this should not break any
code. But it means that the visibility control is established for
future changes to our code base. New macros will not be visible except
when documented as needing to be such. We can no longer inadvertently
add new names to pollute the user's.
I expect that over time, the exception list will become smaller, as we
go through it and remove the items that really shouldn't be visible. We
can then see via smoking if someone is actually using them, and either
decide that these should be visible, or work with the module author for
another way to accomplish their needs. (I would hope this would lead to
proper documentation of the ones that need to be visible.)
There are currently four lists of symbols.
One list is for symbols that are used by libc functions, and that Perl
may redefine (usually so that code doesn't have to know if it is running
on a platform that is lacking the given feature.) The algorithm added
here catches most of these and keeps them visible, but there are a few
items that currently must be manually listed.
A second list is of symbols that the re extension to Perl requires, but
no one else needs to. This list is currently empty, as everything
initially is in the main exception list.
A third list is for items that other Perl extensions require, but no one
else needs to. This list is currently empty, as everything initially is
in the main exception list.
The final list is for items that currently are visible to the whole
world. It contains thousands of items. This list should be examined
for:
1) Names that shouldn't be so visible; and
2) Names that need to remain visible but should be changed so they
are less likely to clash with anything the user might come up
with.
I have wanted this ability to happen for a long time; and now things
have come together to enable it.
This allows us to have a clear-cut boundary with CPAN.
It means you can add macros that have internal-only use without having
to worry about making them likely not to clash with user names.
It shows precisely in one place what our names are that are visible to
CPAN.
This is required for the next few commits that start automatically
creating long Perl_name functions for the elements in embed.fnc that are
macros and don't already have them in the source.
Only macros can take a parameter that has to be a literal string, so
don't fit with the next few commits. This is the only case in embed.fnc
like that, so I'm deferring dealing with it for now.
The 'variant_byte_number' function was written to find the byte number
in a word of the first byte whose meaning varies depending on if the
string it is part of is encoded in UTF-8 or not. On ASCII machines,
that is simply when the upper bit is set. On EBCDIC machines, there is
no similar pattern, so this function hasn't been compiled on those.
A long time ago, I realized that this function could also handle binary
data by coercing that binary data into having the form of having that
bit set or not depending on the pattern being looked for, and then
calling that function.
But I actually hadn't realized until now that it was binary data not
tied to a character set that was being worked on. This commit rectifies
that. A new alias is added for that function that emphasizes that it
works on binary data, the function is now compiled for EBCDIC, and the
EBCDIC-only code that avoided using it is now removed.
This adds a major new ability to subroutine signatures, allowing callers
to pass parameters by name/value pairs rather than by position.
sub f ($x, $y, :$alpha, :$beta = undef) { ... }
f( 123, 456, alpha => 789 );
Originally specified in
https://github.com/Perl/PPCs/blob/main/ppcs/ppc0024-signature-named-parameters.md
This feature is currently considered experimental.
Function combines call of original `package` and `package_version` when
new namespace statement is detected.
Instead of required three statements usage now consists of single function call.
This was in a #ifdef of being in sv.c, which it is, but since it is
public, it needs to be moved out of this. This removes the need for a
copy of its prototype to be in sv_inline.h
The list consists of exactly the functions that have the O flag set in
embed.fnc. No need to keep this data twice. The entries are trivially
generatable from existing entries as we go along
And those generated entries have the added advantage of not using the
short name, so potentially less name space pollution
This new function is for callers that are merely checking if the string
being parsed is a legal identifier or not, and arent interested in the
normalized version of the identifier that parse_indent() generates.
This new function allows callers to not have to think about this buffer;
it just wraps plain parse_ident() using a throw-away buffer to hold the
returned normalized text. This avoids introducing a bunch of
conditionals inside parse_ident.
Prior to this commit, the string passed to this function had to be
pointing to somewhere in PL_bufptr. But this is only because it assumed
that the initial position is less than PL_bufend. By passing the upper
bound in, that assumption is automatically removed.
This makes it clear at each call point what is happening, instead of
having to jump to the S_force_word definition to know what 'false, true'
vs 'true, false' actually means.
And this prepares for future commits.
Most code these days runs under 'use strict'. That allows us to resolve
ambiguity without resorting to heuristics in far more cases than before.
This commit adds a parameter to intuit_more() that gives the context it
is being called from. And when that call is to resolve what $foo[...]
is supposed to mean, we can look up foo to see if it is an array or a
scalar. If the former, the "..." must be a subscript; if a scalar, it
must be a charclass.
Only if there is both a $foo and an @foo is there ambiguity. If so, we
drop down to using the heuristics
This continues the process started in #23592 to change names with
leading underscores to be legal C. See that p.r. or
4bb3572f7a1c1f3944b7f58b22b6e7a9ef5faba6 for extensive discussion.
This commit simply moves the leading underscore to be trailing
This adds three new API functions: a pair to modify a COP by enabling or
disabling a single warning bit within it, and a query function to ask if
a given warning is already enabled.
This API is provided for CPAN modules to use to modify the set of
warnings present in a COP during compile-time. Currently modules need to
use the `new_warnings_bitfield()` function, which was recently hidden by
09a0707. That change broke the `Syntax::Keyword::Try` module, as
reported in https://github.com/Perl/perl5/issues/23609.
See 03f24b8a082948e5b437394fa33d0af08d7b80b6 for the motivation.
This commit changes plain die() to not use a thread context parameter.
It and die_nocontext() now behave identically.
This gives a bit of performance boost in this function that can be
called during pattern matching.
Here are some cachegrind comparisons with blead:
Key:
Ir Instruction read
Dr Data read
Dw Data write
COND conditional branches
IND indirect branches
The numbers represent relative counts per loop iteration, compared to
blead at 100.0%.
Higher is better: for example, using half as many instructions gives 200%,
while using twice as many gives 50%.
GCC CLANG
valid_utf8_to_uv(0x007f), length is 1
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.69 Ir 100.00 99.11
Dr 100.00 101.47 Dr 100.00 99.74
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 101.20 COND 100.00 100.00
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x07ff), length is 2
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.68 Ir 100.00 99.04
Dr 100.00 101.47 Dr 100.00 99.74
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 102.40 COND 100.00 101.23
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0xfffd), length is 3
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.83 Ir 100.00 99.04
Dr 100.00 101.47 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 102.99 COND 100.00 101.84
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0xffffd), length is 4
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.91 Ir 100.00 99.13
Dr 100.00 101.46 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 103.59 COND 100.00 102.45
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x3ffffff), length is 5
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 101.28 Ir 100.00 99.29
Dr 100.00 101.46 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 104.19 COND 100.00 103.07
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x7fffffff), length is 6
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 89.83 Ir 100.00 88.83
Dr 100.00 95.22 Dr 100.00 92.94
Dw 100.00 92.44 Dw 100.00 91.63
COND 100.00 86.21 COND 100.00 87.11
IND 100.00 100.00 IND 100.00 88.89
Clang gives slightly worse results than gcc. But there is an
improvement in both cases for conditionals for two-byte and longer
characters..
This shows that the performance is significantly worse for code points
that take 6 bytes (or more, which I didn't include) to represent. These
are all well outside the Unicode range; hence are very rarely
encountered. Performance is improved a bit for the typical cases.
The algorithm used could handle 6 and 7 byte characters, but that
increases memory usage, and can lead to the compiler choosing to not
inline this function. In blead, experiments with clang gave these
results
Max bytes inlined Instances in the code where not inlined
3 14
4 19
5 19
6 19
7 57
We really need to accomodate any Unicode code point, which is 4 bytes (5
on EBCDIC). But the others we don't care about. Even though 6 bytes
doesn't show as being worse than 4, I chose to not include it, because
we don't care about performance for these rare non-Unicode code points,
and it just might cause non-inlining for different compilers or clang
versions.
Perl almost always opts for saving time over saving space. Hence, we
have croak() that saves time at the expense of space, but needs thread
context available; and croak_no_context() that doesn't need that, but
takes extra time
But, when we are about to die, time isn't that important. Even if we
are doing eval after eval in a tight loop, the potential time savings of
passing the thread context to Perl_croak is insignificant compared to
the tear-down that follows. My claim then is that croak() never needed
a thread context parameter to save a bit of time just before death. It
is an optimization that isn't worth it. And having it do so required
the invention of croak_nocontext(), and the extra cognitive load
associated with two methods for the same task.
This commit changes plain croak() to not use a thread context parameter.
It and croak_nocontext() now behave identically. That means that going
forward, people will likely choose croak() which requires less typing
and occupies fewer columns on the screen, and they won't have to
remember which form to use when.
It's usually a bad idea to try to work around a limitation in common
code by copy-pasting and then modifiying to taste. Fixes/improvements
to the common code rarely get propagated to the outlier.
I wrote code in 1ef9039bccb that did just this for the prototype
definition of SvPV_helper, because the place where it really belongs,
embed.fnc, couldn't (and still doesn't) handle function pointers as
arguments (patches welcome).
I should have at least added a comment to the common code noting the
existence of this outlier.
It turns out that that limitation can be worked around by declaring a
typedef of the pointer, and then using that in embed.fnc.
That's what this commit does.
This commit removes the final instance of duplicating the work of
embed.fnc in the core, except for some in the regex system whose
comments say the reason is to avoid making a typedef public. I haven't
investigated these further.