83769 Commits

Author SHA1 Message Date
Karl Williamson
682cd222bd embed.pl: Remove no longer used sub 2026-01-22 09:55:58 -07:00
Karl Williamson
4b67bbf7f6 embed.pl: Also consider #undef's
This code looks to see what conditions must apply before a #define
happens.  This commit extends that to also look for #undef commands.

The end result is that for symbols that are visible to XS code, but
aren't supposed to be, embed.h contains an #undef so it isn't visible.
But if it already has been #undef'ed, there is no need to do this.

But a symbol can be defined and undefined many times, and the conditions
for doing an #undef may be different than what the symbol was #defined
under.

The consequences of not realizing that a symbol gets undefined are
simply that we generate an unnecessary #undef.  The consequences of
failing to generate one when the symbol is defined is that it is
visibile when not intended to be so.

So, there are various restrictions to try to make sure that we don't
err in the latter direction.
2026-01-22 09:55:58 -07:00
Karl Williamson
46bf0cf5e8 embed.pl: Consider symbols visible only to extensions
Two commits ago, the code was extended to compare the C preprocessor
visibility of a symbol with what the desired visibility of a symbol is.
It assumed that everything not constrained to core was visible
everywhere.  This commit extends that to look for being visible only to
extensions.  As a result, the large number of symbols added to the
override list in that commit are now removed.
2026-01-22 09:55:58 -07:00
Karl Williamson
5ed1b2fcbe embed.pl: Add cpp constraints for .h files
Many of the header files in our source have guards that keep them from
being recursively called, with a convention as to how their name is
derived from the file name.  This commit changes to now consider these
when computing what a cpp conditional evaluates to.  It follows the
convention, except in those few places where it is violated, and sets up
the infrastructure so that this mechanism could be applied for other
cases.

Since this commit was originally written, all but one header file has
been changed to follow the convention, so after rebasing,  only one line
is now being added.
2026-01-22 09:55:58 -07:00
Karl Williamson
6731bcd053 embed.pl: Compare cpp visibility with desired
This commit creates a function that calculates what C preprocessor
constraints there are on the visibility of a #define'd symbol.

It then compares that with what the desired visibility is, based as
prior commits have determined, and reconciles any discrepancies.  It
warns if the symbol is supposed to be visible, but cpp makes it not so.
It adds it to the list of symbols to undefine if it is visible, but is
not supposed to be so.

In order to make this commit somewhat smaller with respect to code
changes, it assumes anything that is visible to extensions is visible
everywhere.  This entailed adding a large number of symbols to the list
of symbols to not #undef, in order to not change embed.h.  The commit
after the next one will fix this, and those symbols will be removed from
the list in that commit.
2026-01-22 09:55:58 -07:00
Karl Williamson
c0955b6422 regen/HeaderParser: _reduce_conds: Return more than a bool
This changes this function to stringify the result into a preprocessor
conditional expression, instead of just a bool 0 or 1.  This gives the
caller more information.

This doesn't change the outcome of callers who are expecting a boolean,
as any string now returned evaluates to true.
2026-01-22 09:55:58 -07:00
Karl Williamson
d3cd2a8764 embed.pl: Assume Perl reserved symbols are visible
Unless there is an indication otherwise.
2026-01-22 09:55:58 -07:00
Karl Williamson
38df3f3835 embed.pl: Formalize reserved Perl symbols
This creates a regular expression pattern of names that we feel free to
expose to XS code's namespace.  Hence they are names reserved for our use,
and should any conflicts arise, the module needs to change, not us.

Naturally, the pattern is pretty restrictive.  It is:

    Any symbol beginning with "PL_"
    Any symbol containing /perl/i, with both sides delimitted
    Any symbol containing "PERL"

Any other spelling that we expose could be considered to pollute the XS
code space.  We feel free to do that all the time.  Any new function's
short name will do that.

And we generally feel free to create macros with arbitrary names which
could conflict with an existing XS name.

Some important potential conflicts are:

New keywords:  We create an exposed KEY_foo macro.  Some existing
modules use some of these.  My grep of CPAN shows maybe a dozen of these
get used; mostly KEY_END.

config.h is full of symbols like HAS_foo, I_bar, and others that are all
exposed.  I don't imagine we can claim to reserve any symbol beginning
with either of those.

Informally, myself and others have used a trailing underscore to
indicate a private symbol.  There are a few distributions that use some
of these anyway.  And there has been pushback when new short symbols
that use this convention have been added.

I would like to get a formal rule about use of this convention.  There
are 200+ of these currently.  We could reserve any names with trailing
underscores, or if that is too much, any ending in, say, 'pl_' or 'PL_'.

We have 3000+ undocumented macro names that don't end in underscores and
which are currently visible to XS code.  This number includes the
KEY_foo ones, but not the ones in config.h.

To deal with namespace pollution, we have had the -DNO_SHORT_NAMES
Configure option for use just with embedded perls.  This hasn't worked
at least since we added inline functions, and it always applied to only
functions.  I have a WIP to get this to work again, and to extend it to
work with documented macros.  It just occurred to me how to make this be
customizable, so that downstream someone could add a list of symbols
that should only exist as 'Perl_foo', and then recompile
2026-01-22 09:55:58 -07:00
Karl Williamson
2166122145 embed.pl: Add some comments 2026-01-22 09:55:58 -07:00
Karl Williamson
8178bdb9ed embed.pl: Use 'next' to remove an else
Then we outdent the contents of that else, and reflow, which makes this
long-ish section of code a bit shorter.
2026-01-22 09:55:58 -07:00
Karl Williamson
e4cec3d716 embed.pl: Swap order of conditionals
It is easier to understand when the nearly trivial case is gotten out of
the way first.
2026-01-22 09:55:58 -07:00
Karl Williamson
46c37430a0 embed.pl: Add constraints
I did an inspection of the source, and found these few symbols that will
not be defined for any XS code.
2026-01-22 09:55:58 -07:00
Karl Williamson
1fd7f0165b embed.pl: Handle case of multiple flags for an element
Some symbols are #defined in multiple places in the input; based
typically on different preprocessor conditionals.  We want to use the
definition which has the widest visibility.

This change showed that USE_STDIO had wrongly been undefined for the
past few commits in blead.  It is no longer actually ever defined by
perl.
2026-01-22 09:55:58 -07:00
Karl Williamson
b0655cabd0 embed.pl: Add sub-hash
This is in preparation for having a different sub-hash at the same level.
2026-01-22 09:55:58 -07:00
Karl Williamson
6355afcfa2 embed.pl: Extract common code into a function
These are just a few lines now, but future commits will make this
function bigger
2026-01-22 09:55:58 -07:00
Karl Williamson
ac8278ef3f embed.pl: Pass file name to subroutine
This will be used in future commits for warnings and errors
2026-01-22 09:55:58 -07:00
Karl Williamson
404a19b06b embed.pl: Remove some commented out code
I no longer think this might ever be useful
2026-01-22 09:55:58 -07:00
Karl Williamson
9ef0d33ac6 embed.pl: Use a separate loop for a hash
This is in preparation for it to do things differently than the other
hashes in the loop it previously was in.
2026-01-22 09:55:58 -07:00
Karl Williamson
ada54ac219 embed.pl: Move declaration
This is an internal value.  The declarations at the top of the program
are for data that someone might want to change.
2026-01-22 09:55:58 -07:00
Karl Williamson
13b09f53da embed.pl: Save hash elem into $var to make more readable 2026-01-22 09:55:58 -07:00
Karl Williamson
f3e59cad18 embed.pl: Rename variable
The new name reflects the source of the data being examined.
2026-01-22 09:55:58 -07:00
Karl Williamson
c33265be37 embed.pl: Move pattern definition to only use
The pattern doesn't get recompiled each time through the loop; it's
easier to understand if the definition and use are near each other
2026-01-22 09:55:58 -07:00
Karl Williamson
943b938012 embed.pl: Improve detection of system symbols
The heuristic previously used had many false positives, so it thought
symbols were for the system that really weren't.  This tightens it up,
and to avoid breaking any existing code that might be relying on those
miscategorized symbols, adds them to the list of unresolved visibility
ones, so that they remain visibile.
2026-01-22 09:55:58 -07:00
Karl Williamson
ada434ed89 embed.pl: Split an 'if' into two
And reorder them.

This paves the way for the next commit which will change them into
having different actions
2026-01-22 09:55:58 -07:00
Karl Williamson
2784d222e2 embed.pl: Recategorize libc symbols
These few symbols had been marked as unresolved as to their visibility.
But in fact they are symbols in libc that do need to always be visible,
and there is already a hash for this type.  Move them to the proper
place.  The net effect is no external changes.
2026-01-22 09:55:58 -07:00
Karl Williamson
6399e4abe5 embed.pl: Add comments 2026-01-22 09:55:58 -07:00
Karl Williamson
37f79c58e9 embed.pl: Move more code to earlier in the file
Future commits will need the results of this code earlier than it is now
calculated.
2026-01-22 09:55:58 -07:00
Karl Williamson
f210eba1a5 embed.pl: Rename variable
The new name adds the detail as to what it constrains
2026-01-22 09:55:58 -07:00
Karl Williamson
aa3eb59ab9 embed.pl: Move code to earlier in file
This now creates lists much earlier so that future commits that will
need them earlier in the process can do so.
2026-01-22 09:55:58 -07:00
Karl Williamson
b00a943889 embed.pl: Add some comments 2026-01-22 09:55:58 -07:00
Karl Williamson
086ad9905f embed.pl: Move comments
A block of comments got pushed down in the file by previous commits to
where it no longer made sense.  Move it back to the top.
2026-01-22 09:55:58 -07:00
Karl Williamson
eb5764fa6c embed.pl: Remove config.h from file skip list
This file doesn't get parsed anyway because it isn't in the MANIFEST,
nor would it work out to parse it in spite of that, if only because it
isn't under source control, and the outputs of this are.
2026-01-22 09:55:58 -07:00
Karl Williamson
ff31f47a54 embed.fnc: Add string assertions for S_intuit-more
This makes sure the terminating character is a NUL.  This internal
function isn't documented as having that requirement, but that's always
the case in our test suite.  And functions it calls assume there is at
least one character in the input, so the assertion shouldn't be EPTRge,
and the test suite fails if it is EPTRgt.
2026-01-22 07:24:00 -07:00
Karl Williamson
31d13f8ab7 embed.fnc: Add string arg assertions for S_packlist
This function takes a string argument with beginning and ending
positions.  It appears to me that those positions are overwritten
without being examined, but the function does get called with an
apparently empty string, but it actually contains a NUL.
2026-01-22 07:24:00 -07:00
Karl Williamson
c6c2ced1e9 embed.fnc: Add string arg assertions for S_incline
This function is expecting a NUL-terminated C string
2026-01-22 07:24:00 -07:00
Karl Williamson
6a39094910 embed.fnc: Add string args assertions for S_wildcard
The end pointer for this function should always point to the terminating
NUL character of the string.
2026-01-22 07:24:00 -07:00
Karl Williamson
ea15ca3786 embed.fnc: Add EPTRtermNUL
Some functions take arguments that point to the terminating NUL
character of a string.  This commit adds a way to declare in embed.fnc
that a given argument is of that kind.
2026-01-22 07:24:00 -07:00
Karl Williamson
e3f7e781f3 embed.pl: Save some hash references in variables
This shortens later references
2026-01-22 07:24:00 -07:00
Karl Williamson
4e751d51ff toke.c: Add comment to S_parse-ident 2026-01-22 07:24:00 -07:00
Karl Williamson
6773f0fb0a embed.fnc: Add comment 2026-01-22 07:24:00 -07:00
Karl Williamson
bbc7f9cd1e Standardize .h recursive #include guard names
Some header files in the Perl core have guards to keep a
recursive #include from compiling them again.  This is standard practice
in C coding, and I was surprised at how many headers don't have
it.  These seem to rely on only being included from perl.h, which does
have its own guard.

Most of the guards use a common naming convention.  If the file is named
foo.h, the guard is named 'PERL_FOO_H'.  Often, though, a trailing
underscore is added,  'PERL_FOO_H_', making the convention slightly
fuzzy.  The 'PERL_' is not added if the file 'foo' already includes
'perl' in its name,

Those rules are enough to describe all the guards in the core, except
for the outliers in this commit, plus perl.h.

There are occasions in various Perl scripts that examine our source that
we want to create a pattern that matches the header guard name for a
particular file.  In spite of the slight fuzziness, that's easy using
the above rules, except for the ones in this commit, and perl.h.
It would be better for that code to not have to worry about the
outliers, and since these are arbitrary names, we can just change them
to follow the rules that already apply to most.

This commit changes the names of the outliers so that the only file the
rules don't apply to is perl.h.  Its guard is named H_PERL.  That
spelling is used in Encode, so it's not so easy to change it seamlessly.
I'm willing to live with it continuing to be an outlier for the code I
write.
2026-01-22 07:21:21 -07:00
Karl Williamson
ed5117fa8b grok_infnan: Handle empty input
This public function dereferences its pointer parameter before checking
its validity.
2026-01-22 06:59:32 -07:00
Karl Williamson
cea46623dd op.c: Don't hand-roll is_dup_mode() attributes
This commit adds an entry in embed.fnc for S_is_dup_mode, removing the
assert and __attribute lines in op.c.

The proximal cause for this commit is that I tried compiling with
Perl_assert() enabled, which resulted in a lot of compiler warnings
because of the __attribute__non_null__ line

But I also think it is better to not hand-roll things unless absolutely
necessary.  Changes someone makes to the general scheme are not likely
to be propagated to the hand-rolled items.
2026-01-22 06:58:57 -07:00
Karl Williamson
fb03d6749a regcomp.c: Remove extraneous conditional
This is checking that the argument is an ASCII \w that isn't an
underscore.  But in this range, that is identical to an alphanumeric,
which there is already a macro for, so use that single macro instead of
the two.
2026-01-22 06:57:22 -07:00
Tony Cook
b933ecdd4a add (or improve) SV numeric comparison APIs
A partial fix for #23918
2026-01-22 13:38:54 +11:00
Tony Cook
7c5ddde8c4 add perldelta for sv_numeq fix and other sv_num* additions
modified the sv.c documentation since the perldelta sv_numeq link had
multiple targets.
2026-01-22 13:09:21 +11:00
Tony Cook
164f562463 Add some documentation for sv_numcmp_common() 2026-01-22 13:09:21 +11:00
Tony Cook
b5f41adca0 check the void context fix for sv_numeq too 2026-01-22 13:09:20 +11:00
Tony Cook
1840104512 add SV_FORCE_OVERLOAD to the sv_numcmp() APIs
and add AMGf_force_overload to amagic_call() which does the actual work.
2026-01-22 13:09:20 +11:00
Tony Cook
e11f0edb1a sv_numeq etc: don't do numify overloading with SV_SKIP_OVERLOAD 2026-01-22 13:09:13 +11:00