Adds a new experimental warning, feature, keywords and enough parsing to
implement basic classes with an empty `new` constructor method.
Inject a $self lexical into method bodies; populate it with the object instance, suitably shifted
Creates a new OP_METHSTART opcode to perform method setup
Define an aux flag to remark which stashes are classes
Basic implementation of fields.
Basic anonymous methods.
This op is constructed using an OP_HELEM as the op_first and any scalar
expression as the op_other.
It is roughly equivalent to the following perl code:
exists $hv{$key} ? $hv{$key} : OTHER
except that the HV and the KEY expression are evaluated only once, and
only one hv_* function is invoked to both test and obtain the value. It
is therefore smaller and more efficient.
Likewise, adding the OPpHELEMEXISTSOR_DELETE flag turns it into the
equivalent of
exists $hv{$key} ? delete $hv{$key} : OTHER
This commit introduces a new OP to replace cases of OP_ANONLIST and
OP_ANONHASH where there are zero elements, which is very common in
Perl code.
As an example, `my $x = {}` is currently implemented like this:
...
6 <2> sassign vKS/2 ->7
4 <@> anonhash sK* ->5
3 <0> pushmark s ->4
5 <0> padsv[$x:1,2] sRM*/LVINTRO ->6
The pushmark serves no meaningful purpose when there are zero
elements and the anonhash, besides undoing the pushmark,
performs work that is unnecessary for this special case.
The peephole optimizer, which also checks for applicability of a
related TARGMY optimization, transforms this example into:
...
- <1> ex-sassign vKS/2 ->4
3 <@> emptyavhv[$x:1,2] vK*/LVINTRO,ANONHASH,TARGMY ->4
- <0> ex-pushmark s ->3
- <0> ex-padsv sRM*/LVINTRO ->-
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH #19943)
For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
7 <2> sassign vKS/2 ->8
5 <$> const[PV "boo"] s ->6
- <1> ex-aelem sKRM*/2 ->7
6 <0> aelemfast_lex[@ary:1,2] sRM ->7
- <0> ex-const s ->-
But now will be turned into:
6 <1> aelemfastlex_store[@ary:1,2] vKS ->7
5 <$> const(PV "boo") s ->6
- <1> ex-aelem sKRM*/2 ->6
- <0> ex-aelemfast_lex sRM ->6
- <0> ex-const s ->-
This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
This commit introduces a new OP to replace simple cases
of OP_SASSIGN and OP_PADSV.
For example, 'my $x = 1' is currently implemented as:
1 <;> nextstate(main 1 -e:1) v:{
2 <$> const(IV 1) s
3 <0> padsv[$x:1,2] sRM*/LVINTRO
4 <2> sassign vKS/2
But now will be turned into:
1 <;> nextstate(main 1 -e:1) v:{
2 <$> const(IV 1) s
3 <1> padsv_store[$x:1,2] vKMS/LVINTRO
This intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
We can also remove many dMY_CXT declarations, as they are no longer needed
(and generate a now unused variable in threaded builds, hence compiler
warnings).
Previously it was part of the module's my_cxt_t, because it was a value
calculated from the interpreter variable PL_maxo. But PL_maxo itself is *not*
a variable - it was converted to a #define in Aug 2016 by commit
8d89205aa6324e7d:
Remove PL_maxo
We have an interpreter variable using memory, PL_maxo, which is
defined to be the same as MAXO, a #defined constant. As far as I can
tell, it is never used in lvalue context, in core or on CPAN, except
for the initialisation in intrpvar.h.
It can simply be removed and replaced with a macro defined as equiva-
lent to MAXO.
It was added in this commit:
commit 84ea024ac9cdf20f21223e686dddea82d5eceb4f
Author: Perl 5 Porters <perl5-porters.nicoh.com>
Date: Tue Jan 2 23:21:55 1996 +0000
perl 5.002beta1h patch: perl.h
5.002beta1 attempted some memory optimizations, but unfortunately
they can result in a memory leak problem. This can be
avoided by #define STRANGE_MALLOC. I do that here until
consensus is reached on a better strategy for handling the
memory optimizations.
Include maxo for the maximum number of operations (needed
for the Safe extension).
But apparently it is not needed for the Safe extension (tests pass
without it).
What the author of that commit didn't realise was that Opcode had been split
out from Safe - the code in question is in this module not Safe.
Adds syntax `defer { BLOCK }` to create a deferred block; code that is
deferred until the scope exits. This syntax is guarded by
use feature 'defer';
Adds a new opcode, `OP_PUSHDEFER`, which is a LOGOP whose `op_other` field
gives the start of an optree to be deferred until scope exit. That op
pointer will be stored on the save stack and invoked as part of scope
unwind.
Included is support for `B::Deparse` to deparse the optree back into
syntax.
Inheriting from Exporter adds several subroutines that are not really
needed by these modules.
The remaining uses of inheritance from Exporter are:
re.pm - this uses export_to_level(), which really, really wants
the inheritance
# Conflicts:
# ext/File-Find/lib/File/Find.pm
# ext/File-Glob/Glob.pm
# ext/GDBM_File/GDBM_File.pm
# ext/Opcode/Opcode.pm
# ext/Pod-Html/lib/Pod/Html.pm
The current "no warranty" text warning against the use of Safe or
Opcode for "security purposes" is somewhat ambiguous. These modules
are not effective sandboxing mechanisms for evaluating untrusted
perl code and should not be used in that manner.
Safe and Opcode are, at best, hardening measures that could be used
in combination with operating system level sandboxing of the perl
interpreter.
Compiling with gcc7, for example, generated a '-Woverflow' warning with
text 'overflow in implicit constant conversion'.
For: https://github.com/Perl/perl5/issues/17664
Adds a new infix operator named `isa`, with the semantics that
$x isa SomeClass
is true if and only if `$x` is a blessed object reference that is either
`SomeClass` directly, or includes the class somewhere in its @ISA
hierarchy. It is false without warning or error for non-references or
non-blessed references.
This operator respects `->isa` method overloading, and is intended to
replace boilerplate code such as
use Scalar::Util 'blessed';
blessed($x) and $x->isa("SomeClass")
The pumpking has determined that the CPAN breakage caused by changing
smartmatch [perl #132594] is too great for the smartmatch changes to
stay in for 5.28.
This reverts most of the merge in commit
da4e040f42421764ef069371d77c008e6b801f45. All core behaviour and
documentation is reverted. The removal of use of smartmatch from a couple
of tests (that aren't testing smartmatch) remains. Customisation of
a couple of CPAN modules to make them portable across smartmatch types
remains. A small bugfix in scope.c also remains.
Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN
or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can
make things a *lot* faster: 4x or more.
In more detail: it will optimise into a single OP_MULTICONCAT, most
expressions of the form
LHS RHS
where LHS is one of
(empty)
my $lexical =
$lexical =
$lexical .=
expression =
expression .=
and RHS is one of
(A . B . C . ...) where A,B,C etc are expressions and/or
string constants
"aAbBc..." where a,A,b,B etc are expressions and/or
string constants
sprintf "..%s..%s..", A,B,.. where the format is a constant string
containing only '%s' and '%%' elements,
and A,B, etc are scalar expressions (so
only a fixed, compile-time-known number of
args: no arrays or list context function
calls etc)
It doesn't optimise other forms, such as
($a . $b) . ($c. $d)
((($a .= $b) .= $c) .= $d);
(although sub-parts of those expressions might be converted to an
OP_MULTICONCAT). This is partly because it would be hard to maintain the
correct ordering of tie or overload calls.
The compiler uses heuristics to determine when to convert: in general,
expressions involving a single OP_CONCAT aren't converted, unless some
other saving can be made, for example if an OP_CONST can be eliminated, or
in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply
OPpTARGET_MY to, but OP_CONST can't.
The multiconcat op is of type UNOP_AUX, with the op_aux structure directly
holding a pointer to a single constant char* string plus a list of segment
lengths. So for
"a=$a b=$b\n";
the constant string is "a= b=\n", and the segment lengths are (2,3,1).
If the constant string has different non-utf8 and utf8 representations
(such as "\x80") then both variants are pre-computed and stored in the aux
struct, along with two sets of segment lengths.
For all the above LHS types, any SASSIGN op is optimised away. For a LHS
of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too.
For example where $a and $b are lexical vars, this statement:
my $c = "a=$a, b=$b\n";
formerly compiled to
const[PV "a="] s
padsv[$a:1,3] s
concat[t4] sK/2
const[PV ", b="] s
concat[t5] sKS/2
padsv[$b:1,3] s
concat[t6] sKS/2
const[PV "\n"] s
concat[t7] sKS/2
padsv[$c:2,3] sRM*/LVINTRO
sassign vKS/2
and now compiles to:
padsv[$a:1,3] s
padsv[$b:1,3] s
multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY
In terms of how much faster it is, this code:
my $a = "the quick brown fox jumps over the lazy dog";
my $b = "to be, or not to be; sorry, what was the question again?";
for my $i (1..10_000_000) {
my $c = "a=$a, b=$b\n";
}
runs 2.7 times faster, and if you throw utf8 mixtures in it gets even
better. This loop runs 4 times faster:
my $s;
my $a = "ab\x{100}cde";
my $b = "fghij";
my $c = "\x{101}klmn";
for my $i (1..10_000_000) {
$s = "\x{100}wxyz";
$s .= "foo=$a bar=$b baz=$c";
}
The main ways in which OP_MULTICONCAT gains its speed are:
* any OP_CONSTs are eliminated, and the constant bits (already in the
right encoding) are copied directly from the constant string attached to
the op's aux structure.
* It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in
all cases; OP_CONCAT only did this in very limited circumstances.
* Because it has a holistic view of the entire concatenation expression,
it can do the whole thing in one efficient go, rather than creating and
copying intermediate results. pp_multiconcat() goes to considerable
efforts to avoid inefficiencies. For example it will only SvGROW() the
target once, and to the exact size needed, no matter what mix of utf8
and non-utf8 appear on the LHS and RHS. It never allocates any
temporary SVs except possibly in the case of tie or overloading.
* It does all its own appending and utf8 handling rather than calling
out to functions like sv_catsv().
* It's very good at handling the LHS appearing on the RHS; for example in
$x = "abcd";
$x = "-$x-$x-";
It will do roughly the equivalent of the following (where targ is $x);
SvPV_force(targ);
SvGROW(targ, 11);
p = SvPVX(targ);
Move(p, p+1, 4, char);
Copy("-", p, 1, char);
Copy("-", p+5, 1, char);
Copy(p+1, p+6, 4, char);
Copy("-", p+10, 1, char);
SvCUR(targ) = 11;
p[11] = '\0';
Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to
handle situations like that.
The code is quite big; both S_maybe_multiconcat() and pp_multiconcat()
(the main compile-time and runtime parts of the implementation) are over
700 lines each. It turns out that when you combine multiple ops, the
number of edge cases grows exponentially ;-)
Most ops that execute a regex, such as match and subst, are of type PMOP.
A PMOP allows the actual regex to be attached directly to that op, due
to its extra fields.
OP_SPLIT is different; it is just a plain LISTOP, but it always has an
OP_PUSHRE as its first child, which *is* a PMOP and which has the regex
attached.
At runtime, pp_pushre()'s only job is to push itself (i.e. the current
PL_op) onto the stack. Later pp_split() pops this to get access to the
regex it wants to execute.
This is a bit unpleasant, because we're pushing an OP* onto the stack,
which is supposed to be an array of SV*'s. As a bit of a hack, on
DEBUGGING builds we push a PVLV with the PL_op address embedded instead,
but this still isn't very satisfactory.
Now that regexes are first-class SVs, we could push a REGEXP onto the
stack rather than PL_op. However, there is an optimisation of @array =
split which eliminates the assign and embeds the array's GV/padix directly
in the PUSHRE op. So split still needs access to that op. But the pushre
op will always be splitop->op_first anyway, so one possibility is to just
skip executing the pushre altogether, and make pp_split just directly
access op_first instead to get the regex and @array info.
But if we're doing that, then why not just go the full hog and make
OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the
data that was spread across the two ops now combined into just the one
split op.
That is exactly what this commit does.
For a simple compile-time pattern like split(/foo/, $s, 1), the optree
looks like:
before:
<@> split[t2] lK
</> pushre(/"foo"/) s/RTIME
<0> padsv[$s:1,2] s
<$> const(IV 1) s
after:
</> split(/"foo"/)[t2] lK/RTIME
<0> padsv[$s:1,2] s
<$> const[IV 1] s
while for a run-time expression like split(/$pat/, $s, 1),
before:
<@> split[t3] lK
</> pushre() sK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const(IV 1)s
after:
</> split()[t3] lK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const[IV 1] s
This makes the code faster and simpler.
At the same time, two new private flags have been added for OP_SPLIT -
OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the
assign op has been optimised away, and if so, whether the array is
lexical.
Also, deparsing of split has been improved, to the extent that
perl TEST -deparse op/split.t
now passes.
Also, a couple of panic messages in pp_split() have been replaced with
asserts().
Opcode.xs:71:21: warning: overflow in implicit constant conversion [-Woverflow]
bitmap[len-1] = (PL_maxo & 0x07) ? ~(0xFF << (PL_maxo & 0x07)) : 0xFF;
This was due to to PL_maxo being converted from a var into a const value
in a previous commit.
Currently subroutine signature parsing emits many small discrete ops
to implement arg handling. This commit replaces them with a couple of ops
per signature element, plus an initial signature check op.
These new ops are added to the OP tree during parsing, so will be visible
to hooks called up to and including peephole optimisation. It is intended
soon that the peephole optimiser will take these per-element ops, and
replace them with a single OP_SIGNATURE op which handles the whole
signature in a single go. So normally these ops wont actually get executed
much. But adding these intermediate-level ops gives three advantages:
1) it allows the parser to efficiently generate subtrees containing
individual signature elements, which can't be done if only OP_SIGNATURE
or discrete ops are available;
2) prior to optimisation, it provides a simple and straightforward
representation of the signature;
3) hooks can mess with the signature OP subtree in ways that make it
no longer possible to optimise into an OP_SIGNATURE, but which can
still be executed, deparsed etc (if less efficiently).
This code:
use feature "signatures";
sub f($a, $, $b = 1, @c) {$a}
under 'perl -MO=Concise,f' now gives:
d <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->d
1 <;> nextstate(main 84 foo:6) v:%,469762048 ->2
2 <+> argcheck(3,1,@) v ->3
3 <;> nextstate(main 81 foo:6) v:%,469762048 ->4
4 <+> argelem(0)[$a:81,84] v/SV ->5
5 <;> nextstate(main 82 foo:6) v:%,469762048 ->6
8 <+> argelem(2)[$b:82,84] vKS/SV ->9
6 <|> argdefelem(other->7)[2] sK ->8
7 <$> const(IV 1) s ->8
9 <;> nextstate(main 83 foo:6) v:%,469762048 ->a
a <+> argelem(3)[@c:83,84] v/AV ->b
- <;> ex-nextstate(main 84 foo:6) v:%,469762048 ->b
b <;> nextstate(main 84 foo:6) v:%,469762048 ->c
c <0> padsv[$a:81,84] s ->d
The argcheck(3,1,@) op knows the number of positional params (3), the
number of optional params (1), and whether it has an array / hash slurpy
element at the end. This op is responsible for checking that @_ contains
the right number of args.
A simple argelem(0)[$a] op does the equivalent of 'my $a = $_[0]'.
Similarly, argelem(3)[@c] is equivalent to 'my @c = @_[3..$#_]'.
If it has a child, it gets its arg from the stack rather than using $_[N].
Currently the only used child is the logop argdefelem.
argdefelem(other->7)[2] is equivalent to '@_ > 2 ? $_[2] : other'.
[ These ops currently assume that the lexical var being introduced
is undef/empty and non-magival etc. This is an incorrect assumption and
is fixed in a few commits' time ]
Opcode.xs: In function ‘opmask_addlocal’:
../../perl.h:6473:12: warning: unused variable ‘my_cxtp’ [-Wunused-variable]
my_cxt_t *my_cxtp = (my_cxt_t *)PL_my_cxt_list[MY_CXT_INDEX]
^
Without debugging enabled, opcode_debug becomes a constant, so
dMY_CXT isn't needed in opmask_addlocal().
Make the enabling of debugging based on a macro (OPCODE_DEBUG)
rather then on '#if 0', then use that accordingly.
opcode_debug has never had an API to turn it on/set it to non-0 since
Opcode::'s initial commit 6badd1a5d1 in 5.003001. Making opcode_debug a
constant allows the CC to constant fold away the code, the warn string
literals, and makes the my_cxt_t struct slightly smaller. Dont remove the
code entirely since someone might find it useful one day.