The asserting fuzzed test case was:
eval q!s,,$0[sub{m[]]],;s,,$0[sub{m[]]],}}!
The assertion triggered was:
pad.c:614: Perl_pad_add_anon: Assertion `!CvWEAKOUTSIDE((const CV *)sv)' failed.
This behaviour was long standing, present in v5.8.8 if not earlier, then
was addressed by:
```
commit eb54d46
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Aug 26 18:26:14 2022 +0200
Stop parsing on first syntax error.
We try to keep parsing after many types of errors, up to a (current)
maximum of 10 errors. Continuing after a semantic error (like
undeclared variables) can be helpful, for instance showing a set of
common errors, but continuing after a syntax error isn't helpful
most of the time as the internal state of the parser can get confused
and is not reliably restored in between attempts. This can produce
sometimes completely bizarre errors which just obscure the true error,
and has resulted in security tickets being filed in the past.
This patch makes the parser stop after the first syntax error, while
preserving the current behavior for other errors. An error is considered
a syntax error if the error message from our internals is the literal
text "syntax error". This may not be a complete list of true syntax
errors, we can iterate on that in the future.
This fixes the segfaults reported in Issue #17397, and #16944 and
likely fixes other "segfault due to compiler continuation after syntax
error" bugs that we have on record, which has been a recurring issue
over the years.
```
GH #18576 was concerned with the value returned from `if/elsif` statements
that both have a false conditional, such as:
my $y=do { if (0) { 5 } elsif(0) { 6 } };
where `$y` should contain an IV with value 0, the value of the last
expression to be evaluated, but it did not.
This problem was fixed as a side-effect of following commit:
4176abf7a8e425113debe55679c99b59bb9d299a
Author: David Mitchell <davem@iabyn.com>
Date: Wed Sep 18 12:28:18 2019 +0100
set VOID on OP_ENTER
The OP_ENTER planted at the start of a program (and possibly elsewhere)
gets left as UNKNOWN context rather than VOID context, due to op_scope()
not honouring the current context.
Fixing this makes things infinitesimally faster.
This commit adds the `if/else` example mentioned above as a specific test
for GH #18576, to add assurance that a future regression would result in
a test failure.
The asserting fuzzed case was:
eval"${sub{sub{//]]]"}}
The assertion triggered was:
perl: op.c:7346: Perl_newSVOP: Assertion `sv' failed.
The bug appeared following:
```
commit: 9ffcdca1f504cb09088413c074b35af4b7f247e3
Author: Father Chrysostomos <sprout@cpan.org>
Date: Mon Nov 12 23:04:16 2012 -0800
Don’t leak subs containing syntax errors
I fixed this for BEGIN blocks earlier, but missed the fact that
all subs are affected.
When called without an o argument (from newANONATTRSUB), newATTRSUB
is expected to return a CV with an unowned reference count of which
the caller will take ownership. We cannot have newATTRSUB returning
a freed CV, so we have it return null instead. But that means
ck_anoncode and pm_runtime have to account for that.
```
The bug disappeared following:
```
commit eb54d46f7264ff7af62c409d8a6ab984a5a34f57
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Aug 26 18:26:14 2022 +0200
Stop parsing on first syntax error.
We try to keep parsing after many types of errors, up to a (current)
maximum of 10 errors. Continuing after a semantic error (like
undeclared variables) can be helpful, for instance showing a set of
common errors, but continuing after a syntax error isn't helpful
most of the time as the internal state of the parser can get confused
and is not reliably restored in between attempts. This can produce
sometimes completely bizarre errors which just obscure the true error,
and has resulted in security tickets being filed in the past.
This patch makes the parser stop after the first syntax error, while
preserving the current behavior for other errors. An error is considered
a syntax error if the error message from our internals is the literal
text "syntax error". This may not be a complete list of true syntax
errors, we can iterate on that in the future.
This fixes the segfaults reported in Issue #17397, and #16944 and
likely fixes other "segfault due to compiler continuation after syntax
error" bugs that we have on record, which has been a recurring issue
over the years.
```
These internal functions can handle empty strings, but it aren't called
with those so far, and it is better practice to not call them with an
empty string, so guard against it now.
On libc (*nix) systems we call `getentropy()` to get the seed needed to
start the PRNG. If that call fails, we fall back to reading the
filesystem via `/dev/urandom`. If that fails we fall back to hashing
some state variables instead.
This should be faster, less risky, and generally better than trying to
read from `/dev/urandom`
Foo
These case statements need not be visible outside this header. Putting
these here avoids cluttering up embed.h, where the same #undef lines
would otherwise be generated
This code looks to see what conditions must apply before a #define
happens. This commit extends that to also look for #undef commands.
The end result is that for symbols that are visible to XS code, but
aren't supposed to be, embed.h contains an #undef so it isn't visible.
But if it already has been #undef'ed, there is no need to do this.
But a symbol can be defined and undefined many times, and the conditions
for doing an #undef may be different than what the symbol was #defined
under.
The consequences of not realizing that a symbol gets undefined are
simply that we generate an unnecessary #undef. The consequences of
failing to generate one when the symbol is defined is that it is
visibile when not intended to be so.
So, there are various restrictions to try to make sure that we don't
err in the latter direction.
Two commits ago, the code was extended to compare the C preprocessor
visibility of a symbol with what the desired visibility of a symbol is.
It assumed that everything not constrained to core was visible
everywhere. This commit extends that to look for being visible only to
extensions. As a result, the large number of symbols added to the
override list in that commit are now removed.
Many of the header files in our source have guards that keep them from
being recursively called, with a convention as to how their name is
derived from the file name. This commit changes to now consider these
when computing what a cpp conditional evaluates to. It follows the
convention, except in those few places where it is violated, and sets up
the infrastructure so that this mechanism could be applied for other
cases.
Since this commit was originally written, all but one header file has
been changed to follow the convention, so after rebasing, only one line
is now being added.
This commit creates a function that calculates what C preprocessor
constraints there are on the visibility of a #define'd symbol.
It then compares that with what the desired visibility is, based as
prior commits have determined, and reconciles any discrepancies. It
warns if the symbol is supposed to be visible, but cpp makes it not so.
It adds it to the list of symbols to undefine if it is visible, but is
not supposed to be so.
In order to make this commit somewhat smaller with respect to code
changes, it assumes anything that is visible to extensions is visible
everywhere. This entailed adding a large number of symbols to the list
of symbols to not #undef, in order to not change embed.h. The commit
after the next one will fix this, and those symbols will be removed from
the list in that commit.
This changes this function to stringify the result into a preprocessor
conditional expression, instead of just a bool 0 or 1. This gives the
caller more information.
This doesn't change the outcome of callers who are expecting a boolean,
as any string now returned evaluates to true.
This creates a regular expression pattern of names that we feel free to
expose to XS code's namespace. Hence they are names reserved for our use,
and should any conflicts arise, the module needs to change, not us.
Naturally, the pattern is pretty restrictive. It is:
Any symbol beginning with "PL_"
Any symbol containing /perl/i, with both sides delimitted
Any symbol containing "PERL"
Any other spelling that we expose could be considered to pollute the XS
code space. We feel free to do that all the time. Any new function's
short name will do that.
And we generally feel free to create macros with arbitrary names which
could conflict with an existing XS name.
Some important potential conflicts are:
New keywords: We create an exposed KEY_foo macro. Some existing
modules use some of these. My grep of CPAN shows maybe a dozen of these
get used; mostly KEY_END.
config.h is full of symbols like HAS_foo, I_bar, and others that are all
exposed. I don't imagine we can claim to reserve any symbol beginning
with either of those.
Informally, myself and others have used a trailing underscore to
indicate a private symbol. There are a few distributions that use some
of these anyway. And there has been pushback when new short symbols
that use this convention have been added.
I would like to get a formal rule about use of this convention. There
are 200+ of these currently. We could reserve any names with trailing
underscores, or if that is too much, any ending in, say, 'pl_' or 'PL_'.
We have 3000+ undocumented macro names that don't end in underscores and
which are currently visible to XS code. This number includes the
KEY_foo ones, but not the ones in config.h.
To deal with namespace pollution, we have had the -DNO_SHORT_NAMES
Configure option for use just with embedded perls. This hasn't worked
at least since we added inline functions, and it always applied to only
functions. I have a WIP to get this to work again, and to extend it to
work with documented macros. It just occurred to me how to make this be
customizable, so that downstream someone could add a list of symbols
that should only exist as 'Perl_foo', and then recompile
Some symbols are #defined in multiple places in the input; based
typically on different preprocessor conditionals. We want to use the
definition which has the widest visibility.
This change showed that USE_STDIO had wrongly been undefined for the
past few commits in blead. It is no longer actually ever defined by
perl.
The heuristic previously used had many false positives, so it thought
symbols were for the system that really weren't. This tightens it up,
and to avoid breaking any existing code that might be relying on those
miscategorized symbols, adds them to the list of unresolved visibility
ones, so that they remain visibile.
These few symbols had been marked as unresolved as to their visibility.
But in fact they are symbols in libc that do need to always be visible,
and there is already a hash for this type. Move them to the proper
place. The net effect is no external changes.
This file doesn't get parsed anyway because it isn't in the MANIFEST,
nor would it work out to parse it in spite of that, if only because it
isn't under source control, and the outputs of this are.
This makes sure the terminating character is a NUL. This internal
function isn't documented as having that requirement, but that's always
the case in our test suite. And functions it calls assume there is at
least one character in the input, so the assertion shouldn't be EPTRge,
and the test suite fails if it is EPTRgt.
This function takes a string argument with beginning and ending
positions. It appears to me that those positions are overwritten
without being examined, but the function does get called with an
apparently empty string, but it actually contains a NUL.
Some functions take arguments that point to the terminating NUL
character of a string. This commit adds a way to declare in embed.fnc
that a given argument is of that kind.
Some header files in the Perl core have guards to keep a
recursive #include from compiling them again. This is standard practice
in C coding, and I was surprised at how many headers don't have
it. These seem to rely on only being included from perl.h, which does
have its own guard.
Most of the guards use a common naming convention. If the file is named
foo.h, the guard is named 'PERL_FOO_H'. Often, though, a trailing
underscore is added, 'PERL_FOO_H_', making the convention slightly
fuzzy. The 'PERL_' is not added if the file 'foo' already includes
'perl' in its name,
Those rules are enough to describe all the guards in the core, except
for the outliers in this commit, plus perl.h.
There are occasions in various Perl scripts that examine our source that
we want to create a pattern that matches the header guard name for a
particular file. In spite of the slight fuzziness, that's easy using
the above rules, except for the ones in this commit, and perl.h.
It would be better for that code to not have to worry about the
outliers, and since these are arbitrary names, we can just change them
to follow the rules that already apply to most.
This commit changes the names of the outliers so that the only file the
rules don't apply to is perl.h. Its guard is named H_PERL. That
spelling is used in Encode, so it's not so easy to change it seamlessly.
I'm willing to live with it continuing to be an outlier for the code I
write.