PATCH: [perl #72998] regex looping

If a character folds to multiple ones in case-insensitive matching,
it should not match just one of those, or the regular expression can
loop.  For example, \N{LATIN SMALL LIGATURE FF} folds to 'ff', and so
    "\N{LATIN SMALL LIGATURE FF}" =~ /f+/i
should match.  Prior to this patch, this function returned that there is
a match, but left the matching string  pointer at the beginning of the
"\N{LATIN SMALL LIGATURE FF}" because it doesn't make sense to match
just half a character, and at this level it doesn't know about the '+'.
This leaves things in an inconsistent state, with the reporting of a
match, but the input pointer unchanged, the result of which is a loop.

I don't know how to fix this so that it correctly matches, and there are
semantic issues with doing so.  For example, if
    "\N{LATIN SMALL LIGATURE FF}" =~ /ff/i
matches, then one would think that so should
    "\N{LATIN SMALL LIGATURE FF}" =~ /(f)(f)/i
But $1 and $2 don't really make sense here, since they both refer to the
half of the same character.

So this patch just returns failure if only a partial character is
matched.  That leaves things consistent, and solves the problem of
looping, so that Perl doesn't hang on such a construct, but leaves the
ultimate solution for another day.
This commit is contained in:
Karl Williamson 2010-04-13 21:25:36 -06:00 committed by Rafael Garcia-Suarez
parent cfbab81b96
commit 7dcb3b25fc
2 changed files with 11 additions and 2 deletions

View File

@ -51,6 +51,14 @@ if ('1234'=~/(?:(?<A>\d)|(?<C>!))(?<B>\d)(?<A>\d)(?<B>\d)/){
}
is(regnames_count(),3);
}
{ # Keep this test last, as whole script will be interrupted if times out
# Bug #72998; this can loop
watchdog(2);
eval '"\x{100}\x{FB00}" =~ /\x{100}\N{U+66}+/i';
pass("Didn't loop");
}
# New tests above this line, don't forget to update the test count below!
BEGIN { plan tests => 18 }
BEGIN { plan tests => 19 }
# No tests here!

3
utf8.c
View File

@ -2609,7 +2609,8 @@ Perl_ibcmp_utf8(pTHX_ const char *s1, char **pe1, register UV l1, bool u1, const
/* A match is defined by all the scans that specified
* an explicit length reaching their final goals. */
match = (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2);
match = (n1 == 0 && n2 == 0 /* Must not match partial char; Bug #72998 */
&& (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2));
if (match) {
if (pe1)