mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 08:38:23 +00:00
regex: eval/cut: fix premature local undo and $^R
GH #16197 The main purpose of this commit is to stop premature scope unwinding within eval code in regexes. Aside from backtracking on failure, the scopes of every eval, such as (?{ local $x }), are supposed to accumulate, and are only unwound en-masse at the very end of the pattern match. However, a combination of a sub-pattern call, such as (??{...}) or (?&...), combined with a cut, (?>...), can trigger earlier savestack popping. The direct fix for this, as explained below, is to remove this single line from the EVAL_postponed_A/B: case: REGCP_UNWIND(ST.cp); /* LEAVE in disguise */ However, that line is entwined with code which attempts to preserve the final value of $^R during scope unwinding. Since that code was kind of working around the misplaced REGCP_UNWIND(), it needs ripping out and re-implementing. This has to be done at the same time, so the bulk of this commit is actually concerned with $^R, even though it isn't the subject of this ticket. So this commit doesn't change the behaviour of $^R, but just changes its implementation somewhat. The $^R issue is that every /(?{...})/ causes the equivalent of local $^R = ...; to be executed. During final exit, the savestack gets unwound, and all those local's get undone, leaving $^R with the value it had before the match started. But we promise that after the match, $^R will hold the value of the most recent (?{...}). The code which this commit rips out restored that value in one way; the new code in this commit does it a different way. Basically, almost the last thing S_regmatch() does is a LEAVE_SCOPE(orig_savestack_ix); This commit makes it so that the current value of $^R is copied just before the LEAVE_SCOPE(), and that value is copied back to $^R just after the LEAVE_SCOPE(). For efficiency, we only do the copy if we've actually set $^R. A mechanism is also needed to ensure that the temporary copy doesn't leak if we die during the savestack unwind. This is achieved by holding a pointer to the copy in the aux_eval struct, which gets processed even if we die. Now back to the main purpose of this commit, the premature stack unwind in the presence of a cut with a sub-pattern. This bug has been there since these features were added. It is instructive to look at a somewhat idealised overview of the S_regmatch() function from around 5.6.0 (with some bug fixes from later releases added). This was while the function was still recursive. It looks approximately like: pp_match(...) { I32 ix = PL_savestack_ix; ... do a match ... LEAVE_SCOPE(ix); } S_regmatch(...) { while (scan) { switch (OP(scan)) { case FOO: if (! there's a FOO) return 0; I32 ix = PL_savestack_ix; if (regmatch(...)) /* recursively match rest of pattern */ return 1; LEAVE_SCOPE(ix); return 0; case END: if (doing a (??{...}) ) { I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively match rest of pattern */ LEAVE_SCOPE(ix); return 1; } LEAVE_SCOPE(ix); return 0; } return 1; case EVAL: ... run the code, then, if its a (?{...}) ... I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively run subpattern */ LEAVE_SCOPE(ix) return 1; } LEAVE_SCOPE(ix); return 0; } } Here, the FOO: case represents all the various ops which recurse. In general, they match the next item and then recurse to match the rest of the pattern. Note that they all do a LEAVE_SCOPE() only in the *failure* branch. At the end of a successful match, there is potentially much recursion, and much stuff on the savestack. When the END op is reached, the series of 'return 1's causes all the recursion to unwind, while leaving the savestack untouched. Finally, the caller - such as pp_match() - clears the savestack. In more recent perls the recursion has been removed and the final LEAVE_SCOPE() is done within S_regmatch() itself, but the principle remains the same: no stack freeing is done *during* matching, and instead there's a single big clean up at the end. Once (??{...}) enters the picture, that changes a bit. When the END op associated with the '...' sub-pattern is reached, regmatch() is called recursively to process any pattern after the (??{..}); then on success, while working its way back through the nested regmatch() calls, both the END and the EVAL code do a LEAVE_SCOPE() in the *success* branch. This is anomalous, and those two LEAVE_SCOPE()'s are what this commit removes (although in the current non-recursive regex engine, they are shared by the same piece of code, so only one had to be removed). By removing them, this regularises the behaviour of sub-patterns. I can't think why those LEAVE_SCOPE()s were originally added, and assume it was a thinko. Normally it makes no difference whether the savestack is popped near the end, interleaved with popping all recursive regmatch calls (or equivalently on non-recursive engines popping the regmatch_state stack), or whether the savestack is popped only after all the recursion is exited from. However, it makes a difference in the presence of a cut, (?>...). Here, the final op of the sub-pattern '...' is SUCCEED, which rather than recursing to match anything following the cut block, just returns. The recursion pops back to the SUSPEND op which started the cut, which then continues with the op loop as normal. Thus when about to match the first op following a (?>...), the recursion *within* the cut has been blown away as if it never happened, but the accumulated savestack entries (e.g. from evals within the cut block) are preserved and continue to accumulate. Now, if there is a (??{...}) sub-pattern, or similarly a (?&FOO), within the cut, then at the end of the cut, the recursion is unwound, which includes the stacked EVAL and END recursions, which at this time call LEAVE_SCOPE(), which frees part of the savestack, even though the pattern match hasn't ended yet. That's the bug which this commit fixes. The tests added to pat_re_eval.t check that the scope bug has been fixed. The test added to svleak.t checks that the new $^R copying code doesn't leak.
This commit is contained in:
parent
d306795336
commit
729f4d269c
67
regexec.c
67
regexec.c
@ -320,7 +320,12 @@ S_regcppush(pTHX_ const regexp *rex, I32 parenfloor, U32 maxopenparen comma_pDEP
|
||||
|
||||
/* REGCP_UNWIND(cp): unwind savestack back to marked index,
|
||||
* but without restoring regcppush()ed data (leave_scope() treats
|
||||
* SAVEt_REGCONTEXT as a NOOP)
|
||||
* SAVEt_REGCONTEXT as a NOOP).
|
||||
*
|
||||
* Note that the stack is normally only unwound on failure and
|
||||
* backtracking. Successful matches involve accumulating many savestack
|
||||
* entries which are all freed in *one go* during the final exit from
|
||||
* S_regmatch().
|
||||
*/
|
||||
|
||||
#define REGCP_UNWIND(cp) \
|
||||
@ -6683,7 +6688,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
|
||||
during a successful match */
|
||||
U32 lastopen = 0; /* last open we saw */
|
||||
bool has_cutgroup = RXp_HAS_CUTGROUP(rex) ? 1 : 0;
|
||||
SV* const oreplsv = GvSVn(PL_replgv);
|
||||
/* these three flags are set by various ops to signal information to
|
||||
* the very next op. They have a useful lifetime of exactly one loop
|
||||
* iteration, and are not preserved or restored by state pushes/pops
|
||||
@ -6726,9 +6730,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
|
||||
DECLARE_AND_GET_RE_DEBUG_FLAGS;
|
||||
#endif
|
||||
|
||||
/* protect against undef(*^R) */
|
||||
SAVEFREESV(SvREFCNT_inc_simple_NN(oreplsv));
|
||||
|
||||
/* shut up 'may be used uninitialized' compiler warnings for dMULTICALL */
|
||||
multicall_oldcatch = 0;
|
||||
PERL_UNUSED_VAR(multicall_cop);
|
||||
@ -8754,18 +8755,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
|
||||
S_set_reg_curpm(aTHX_ rex_sv, reginfo);
|
||||
rex = ReANY(rex_sv);
|
||||
rexi = RXi_GET(rex);
|
||||
{
|
||||
/* preserve $^R across LEAVE's. See Bug 121070. */
|
||||
SV *save_sv= GvSV(PL_replgv);
|
||||
SV *replsv;
|
||||
SvREFCNT_inc(save_sv);
|
||||
REGCP_UNWIND(ST.cp); /* LEAVE in disguise */
|
||||
/* don't move this initialization up */
|
||||
replsv = GvSV(PL_replgv);
|
||||
sv_setsv(replsv, save_sv);
|
||||
SvSETMAGIC(replsv);
|
||||
SvREFCNT_dec(save_sv);
|
||||
}
|
||||
cur_eval = ST.prev_eval;
|
||||
cur_curlyx = ST.prev_curlyx;
|
||||
|
||||
@ -10325,20 +10314,21 @@ NULL
|
||||
DEBUG_EXECUTE_r(Perl_re_printf( aTHX_ "%sMatch successful!%s\n",
|
||||
PL_colors[4], PL_colors[5]));
|
||||
|
||||
if (reginfo->info_aux_eval) {
|
||||
/* each successfully executed (?{...}) block does the equivalent of
|
||||
* local $^R = do {...}
|
||||
* When popping the save stack, all these locals would be undone;
|
||||
* bypass this by setting the outermost saved $^R to the latest
|
||||
* value */
|
||||
/* I don't know if this is needed or works properly now.
|
||||
* see code related to PL_replgv elsewhere in this file.
|
||||
* Yves
|
||||
if (reginfo->info_aux_eval && orig_savestack_ix < PL_savestack_ix) {
|
||||
/* After exiting a match, $^R should still hold the value of the
|
||||
* latest (?{...}). E.g.
|
||||
* /(?{42})/ and print $^R;
|
||||
* will print 42. However, each successfully executed (?{...})
|
||||
* block does the equivalent of 'local $^R = ...'. In addition,
|
||||
* there may be an explicit 'local $^R' or similar code. When
|
||||
* popping the savestack at the end, all these locals would be
|
||||
* undone. Avoid this issue by making a copy of the final value
|
||||
* just *before* the final savestack unwind. After the unwind,
|
||||
* we set $^R to that value.
|
||||
* This temporary copy is stored in the aux_eval struct so that
|
||||
* it will get freed even if we die during savestack unwind.
|
||||
*/
|
||||
if (oreplsv != GvSV(PL_replgv)) {
|
||||
sv_setsv(oreplsv, GvSV(PL_replgv));
|
||||
SvSETMAGIC(oreplsv);
|
||||
}
|
||||
reginfo->info_aux_eval->final_replsv = newSVsv(GvSV(PL_replgv));
|
||||
}
|
||||
result = 1;
|
||||
goto final_exit;
|
||||
@ -10403,13 +10393,25 @@ NULL
|
||||
|
||||
if (last_pushed_cv) {
|
||||
dSP;
|
||||
/* see "Some notes about MULTICALL" above */
|
||||
/* see "Some notes about MULTICALL" above, especially how
|
||||
* the POP_MULTICALL does the equivalent of the LEAVE_SCOPE
|
||||
* for us, so no need to do it explicitly. */
|
||||
POP_MULTICALL;
|
||||
PERL_UNUSED_VAR(SP);
|
||||
}
|
||||
else
|
||||
LEAVE_SCOPE(orig_savestack_ix);
|
||||
|
||||
if ( reginfo->info_aux_eval
|
||||
&& reginfo->info_aux_eval->final_replsv)
|
||||
{
|
||||
/* '/(?{42})/; print $^R' should print 42; now that the
|
||||
* savestack has been popped, set the final value */
|
||||
SV *replsv = GvSV(PL_replgv);
|
||||
sv_setsv(replsv, reginfo->info_aux_eval->final_replsv);
|
||||
SvSETMAGIC(replsv);
|
||||
}
|
||||
|
||||
assert(!result || locinput - reginfo->strbeg >= 0);
|
||||
return result ? locinput - reginfo->strbeg : -1;
|
||||
}
|
||||
@ -11503,6 +11505,8 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
|
||||
RXp_SUBOFFSET(rex) = 0;
|
||||
RXp_SUBCOFFSET(rex) = 0;
|
||||
RXp_SUBLEN(rex) = reginfo->strend - reginfo->strbeg;
|
||||
|
||||
eval_state->final_replsv = NULL; /* the final value of $^R. */
|
||||
}
|
||||
|
||||
|
||||
@ -11553,6 +11557,7 @@ S_cleanup_regmatch_info_aux(pTHX_ void *arg)
|
||||
PM_SETRE(eval_state->old_op, eval_state->old_op_val);
|
||||
SvREFCNT_dec(old_rx);
|
||||
}
|
||||
SvREFCNT_dec(eval_state->final_replsv);
|
||||
}
|
||||
|
||||
PL_regmatch_state = aux->old_regmatch_state;
|
||||
|
||||
1
regexp.h
1
regexp.h
@ -791,6 +791,7 @@ typedef struct {
|
||||
SV *sv; /* $_ during (?{}) */
|
||||
MAGIC *pos_magic; /* pos() magic attached to $_ */
|
||||
SSize_t pos; /* the original value of pos() in pos_magic */
|
||||
SV *final_replsv; /* the final value of $^R. */
|
||||
U8 pos_flags; /* flags to be restored; currently only MGf_BYTES*/
|
||||
} regmatch_info_aux_eval;
|
||||
|
||||
|
||||
@ -15,7 +15,7 @@ BEGIN {
|
||||
|
||||
use Config;
|
||||
|
||||
plan tests => 157;
|
||||
plan tests => 158;
|
||||
|
||||
# run some code N times. If the number of SVs at the end of loop N is
|
||||
# greater than (N-1)*delta at the end of loop 1, we've got a leak
|
||||
@ -710,3 +710,31 @@ package myconcat {
|
||||
'overloaded pattern with code block'
|
||||
);
|
||||
}
|
||||
|
||||
# When making a final copy of $^R before unwinding the savestack,
|
||||
# make sure that the copy doesn't leak if we die during that unwinding.
|
||||
# Dying in STORE triggers that.
|
||||
|
||||
package GH16197 {
|
||||
|
||||
sub TIESCALAR { bless [ 0 ]; }
|
||||
sub FETCH { $_[0][0] }
|
||||
sub STORE { my $v = $_[1];
|
||||
# die when undoing the 'local'; the previous val was real,
|
||||
# new value is undef.
|
||||
if ($_[0][-1] and !$v) {
|
||||
@{$_[0]} = ();
|
||||
die;
|
||||
}
|
||||
push @{$_[0]}, $v;
|
||||
}
|
||||
|
||||
our $x99;
|
||||
local $x99;
|
||||
tie $x99, 'GH16197';
|
||||
|
||||
::leak(5, 0,
|
||||
sub { eval { "" =~ /(?{ local $x99; $x99 = 9 })/; }; },
|
||||
"no leak in \$^R copy during stack unwind",
|
||||
);
|
||||
}
|
||||
|
||||
@ -25,7 +25,7 @@ BEGIN {
|
||||
our @global;
|
||||
|
||||
|
||||
plan tests => 528; # Update this when adding/deleting tests.
|
||||
plan tests => 552; # Update this when adding/deleting tests.
|
||||
|
||||
run_tests() unless caller;
|
||||
|
||||
@ -1446,6 +1446,53 @@ sub run_tests {
|
||||
is($got, "[A-[X-,X-,XY,XY],A-,AB,AB]", "GH22869");
|
||||
}
|
||||
|
||||
# GH #16197
|
||||
# A subpattern, i.e. (??{...}) or (?&...), when combined with
|
||||
# a cut, (?>...), caused the savestack to be prematurely unwound,
|
||||
# resulting in localisations within (?{...}) being undone before
|
||||
# the end of the match.
|
||||
|
||||
{
|
||||
my ($match, @vals);
|
||||
|
||||
# The first six permutations do no backtracking, but are: with or
|
||||
# without a cut; and match a digit via either a simple \d,
|
||||
# or via a sub-pattern - (??{}) or (?&).
|
||||
# The next six permutations add some backtracking.
|
||||
#
|
||||
# Previously the combination of a cut and a subpattern returned
|
||||
# "101 101 101 101 101".
|
||||
|
||||
for my $bt ('', '[a-z]*') { # trigger a backtrack?
|
||||
for my $cut ('', '?>') {
|
||||
for my $subpat (q{\d}, q{(??{ '\d' })}, q{(?&DIGIT)} ) {
|
||||
|
||||
my $desc = "bt=$bt cut=$cut subp=$subpat";
|
||||
@vals = ();
|
||||
local our $x99 = 100;
|
||||
|
||||
use re 'eval';
|
||||
$match =
|
||||
"a1b2c3d4e5" =~
|
||||
/^ (
|
||||
[a-z]
|
||||
($cut
|
||||
$subpat
|
||||
(?{ local $x99 = $x99 + 1; push @vals, $x99 })
|
||||
)
|
||||
$bt
|
||||
){5}
|
||||
|
||||
(?(DEFINE) (?<DIGIT> \d ) )
|
||||
/x;
|
||||
ok($match, "GH 16197: match; $desc");
|
||||
is("@vals", "101 102 103 104 105",
|
||||
"GH 16197: local vals; $desc");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} # End of sub run_tests
|
||||
|
||||
1;
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user