regex engine: split EVAL_postponed_AB state

(This commit makes no practical changes in behaviour except for
debugging output.)

Currently, one of the regex engine stack states is EVAL_postponed_AB.

When executing something like /(??{'A'})B/ where A and B represent
general subpatterns, the engine executes the eval code, which returns
the string 'A', which is compiled into a subpattern. Then the engine
pushes an EVAL_postponed_AB state and runs the subpattern until it
reaches an END op. Then it pushes *another* EVAL_postponed_AB state and
runs the B part of the pattern until the final END. Then before
returning success, it pops EVAL_postponed_AB off the stack (twice),
executing any cleanup required. Similarly during failure, the
EVAL_postponed_AB_fail action will be executed once or twice (depending
on whether it failed during A or B).

This commit splits that state into two,

    EVAL_postponed_A
    EVAL_postponed_B

The first is pushed before running A, the second before running B.
The actions currently remain the same and share the same code; i.e. this
commit just does the equivalent of:

-        case EVAL_postponed_AB:
+        case EVAL_postponed_A:
+        case EVAL_postponed_B:
            ... cleanup code ....

But it makes the code easier to understand, makes debugging output
clearer, and will allow in future for the cleanup behaviours to differ
between A and B.

This commit also fixes up a few debugging messages and code comments
which were still referring to 'EVAL_AB', which was renamed to
EVAL_postponed_AB some years ago.
This commit is contained in:
David Mitchell 2025-11-25 14:06:33 +00:00
parent 676faf69f7
commit 407267be97
3 changed files with 388 additions and 352 deletions

View File

@ -332,7 +332,7 @@ REGEX_SET REGEX_SET, depth p S ; Regex set, temporary node used in pre-optimi
#
#
TRIE next:FAIL
EVAL B,postponed_AB:FAIL
EVAL B,postponed_A,postponed_B:FAIL
CURLYX end:FAIL
WHILEM A_pre,A_min,A_max,B_min,B_max:FAIL
BRANCH next:FAIL

View File

@ -6671,7 +6671,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
/* mark_state piggy backs on the yes_state logic so that when we unwind
the stack on success we can update the mark_state as we go */
regmatch_state *mark_state = NULL; /* last mark state we have seen */
regmatch_state *cur_eval = NULL; /* most recent EVAL_AB state */
regmatch_state *cur_eval = NULL; /* most recent EVAL_postponed_A state */
struct regmatch_state *cur_curlyx = NULL; /* most recent curlyx */
U32 state_num;
bool no_final = 0; /* prevent failure from backtracking? */
@ -8719,15 +8719,18 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
ST.prev_eval = cur_eval;
cur_eval = st;
/* now continue from first node in postoned RE */
PUSH_YES_STATE_GOTO(EVAL_postponed_AB, startpoint, locinput,
PUSH_YES_STATE_GOTO(EVAL_postponed_A, startpoint, locinput,
loceol, script_run_begin);
NOT_REACHED; /* NOTREACHED */
}
case EVAL_postponed_AB: /* cleanup after a successful (??{A})B */
case EVAL_postponed_A: /* cleanup the A part after a
successful (??{A})B */
case EVAL_postponed_B: /* cleanup the B part after a
successful (??{A})B */
/* note: this is called twice; first after popping B, then A */
DEBUG_STACK_r({
Perl_re_exec_indentf( aTHX_ "EVAL_AB cur_eval = %p prev_eval = %p\n",
Perl_re_exec_indentf( aTHX_ "EVAL_postponed_A/B cur_eval = %p prev_eval = %p\n",
depth, cur_eval, ST.prev_eval);
});
@ -8744,7 +8747,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
rex->recurse_locinput[CUR_EVAL.close_paren - 1] = VAL; \
}
SET_RECURSE_LOCINPUT("EVAL_AB[before]", CUR_EVAL.prev_recurse_locinput);
SET_RECURSE_LOCINPUT("EVAL_postponed_A/B[before]", CUR_EVAL.prev_recurse_locinput);
rex_sv = ST.prev_rex;
is_utf8_pat = reginfo->is_utf8_pat = cBOOL(RX_UTF8(rex_sv));
@ -8771,7 +8774,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( nochange_depth )
nochange_depth--;
SET_RECURSE_LOCINPUT("EVAL_AB[after]", cur_eval->locinput);
SET_RECURSE_LOCINPUT("EVAL_postponed_A/B[after]", cur_eval->locinput);
sayYES;
@ -8780,7 +8783,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
regcppop(rex, &maxopenparen);
sayNO;
case EVAL_postponed_AB_fail: /* unsuccessfully ran A or B in (??{A})B */
case EVAL_postponed_A_fail: /* unsuccessfully ran A in (??{A})B */
case EVAL_postponed_B_fail: /* unsuccessfully ran B in (??{A})B */
/* note: this is called twice; first after popping B, then A */
DEBUG_STACK_r({
Perl_re_exec_indentf( aTHX_ "EVAL_AB_fail cur_eval = %p prev_eval = %p\n",
@ -9902,7 +9906,7 @@ NULL
SET_RECURSE_LOCINPUT("FAKE-END[after]", cur_eval->locinput);
PUSH_YES_STATE_GOTO(EVAL_postponed_AB, /* match B */
PUSH_YES_STATE_GOTO(EVAL_postponed_B, /* match B */
st->u.eval.prev_eval->u.eval.B,
locinput, loceol, script_run_begin);
}

File diff suppressed because it is too large Load Diff