mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 08:38:23 +00:00
Merge branch 'Fix utf8 corner cases' into blead
There are around 20 different functions that take a UTF-8 sequence of bytes and try to find the ordinal code point represented by them. It was becoming clear that the existing tests in our suite were inadequate, not finding glaring bugs. And UTF-8 handling is important, with failures in it having been exploited by hackers in various products over the years for various nefarious purposes. I set out to improve the tests, spending way too much time before realizing that adding band aids to the current scheme was not going to work out. So I undertook rewriting the tests. This turned out to be way harder and time consuming than I expected. And it still isn't ready to go into blead. But along the way, I discovered that it was finding corner case bugs that I would never have anticipated. This series of commits fixes those, while simplifying the code and reducing redundancy. The new test file needs clean-up, and probably ways to make it faster, but it is finally far enough along that I believe it has caught most of the bugs out there. So I'm submitting these now to get into v5.42. The deadline for the test file is later in the development process.
This commit is contained in:
commit
a1805b9cc6
@ -2020,8 +2020,12 @@ foreach my $test (@tests) {
|
||||
@warnings_gotten = @returned_warnings;
|
||||
}
|
||||
|
||||
SKIP: {
|
||||
skip "$0 doesn't handle _msgs functions AV returns", 1
|
||||
if $utf8_func =~ /_msgs/;
|
||||
do_warnings_test(@expected_warnings)
|
||||
or diag "Call was: " . utf8n_display_call($eval_text);
|
||||
}
|
||||
undef @warnings_gotten;
|
||||
|
||||
# Check CHECK_ONLY results when the input is
|
||||
|
||||
1
inline.h
1
inline.h
@ -3244,7 +3244,6 @@ PERL_STATIC_INLINE UV
|
||||
Perl_utf8_to_uvchr_buf(pTHX_ const U8 *s, const U8 *send, STRLEN *retlen)
|
||||
{
|
||||
PERL_ARGS_ASSERT_UTF8_TO_UVCHR_BUF;
|
||||
assert(s < send);
|
||||
|
||||
UV cp;
|
||||
|
||||
|
||||
3
utf8.h
3
utf8.h
@ -1206,8 +1206,9 @@ point's representation.
|
||||
* First one will convert the overlong to the REPLACEMENT CHARACTER; second
|
||||
* will return what the overlong evaluates to */
|
||||
#define UTF8_ALLOW_LONG 0x2000
|
||||
#define UTF8_ALLOW_LONG_AND_ITS_VALUE 0x4000
|
||||
#define UTF8_GOT_LONG UTF8_ALLOW_LONG
|
||||
#define UTF8_ALLOW_LONG_AND_ITS_VALUE 0x4000
|
||||
#define UTF8_GOT_LONG_WITH_VALUE UTF8_ALLOW_LONG_AND_ITS_VALUE
|
||||
|
||||
/* For back compat, these old names are misleading for overlongs and
|
||||
* UTF_EBCDIC. */
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user