Reimplement tr/// without swashes

This large commit removes the last use of swashes from core.

It replaces swashes by inversion maps.  This data structure is already
in use for some Unicode properties, such as case changing.

The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one.  A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions.  This should be faster than the previous implementation
anyway.  I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges.  These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.

Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster.  This means that there is extra code,
purely for runtime performance.

An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255.  Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it.  That could be
changed if I am wrong.

Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all.  This fixes some false
negatives in the previous implementation.  It also allows for detecting
if the actual transliteration can be done in place.  Previously, the
code mostly punted on that detection for the UTF-8 case.

This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.

A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8.  If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts.  The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used.  That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile.  AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.

Inversion maps are ideally suited for tr/// implementations.  An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data.  However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.

Inversion maps are also easier to deparse than swashes.  A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.

One could implement specialized data structures for specific types of
inputs.  For example, a common tr/// form is a single range, like
tr/A-Z/a-z/.  That could be implemented without a table and be quite
fast.  An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.

Thanks to Nicholas Rochemagne for his help on B
This commit is contained in:
Karl Williamson 2019-11-04 21:30:48 -07:00
parent 8c90d3a9c7
commit f34acfecc2
10 changed files with 1372 additions and 865 deletions

546
doop.c
View File

@ -22,6 +22,7 @@
#include "EXTERN.h"
#define PERL_IN_DOOP_C
#include "perl.h"
#include "invlist_inline.h"
#ifndef PERL_MICRO
#include <signal.h>
@ -297,328 +298,240 @@ S_do_trans_complex(pTHX_ SV * const sv, const OPtrans_map * const tbl)
/* Helper function for do_trans().
* Handles utf8 cases(*) not involving the /c, /d, /s flags,
* and where search and replacement charlists aren't identical.
* (*) i.e. where the search or replacement charlists are utf8. sv may
* or may not be utf8.
* Handles cases where an inversion map implementation is to be used and the
* search and replacement charlists are identical: so the string isn't
* modified, and only a count of modifiable chars is needed.
*
* Note that it doesn't handle /d nor /s, since these modify the string
* even if the replacement charlist is empty.
*
* sv may or may not be utf8.
*/
STATIC Size_t
S_do_trans_simple_utf8(pTHX_ SV * const sv)
S_do_trans_count_invmap(pTHX_ SV * const sv, AV * const invmap)
{
U8 *s;
U8 *send;
Size_t matches = 0;
STRLEN len;
SV** const from_invlist_ptr = av_fetch(invmap, 0, TRUE);
SV** const to_invmap_ptr = av_fetch(invmap, 1, TRUE);
SV* from_invlist = *from_invlist_ptr;
SV* to_invmap_sv = *to_invmap_ptr;
UV* map = (UV *) SvPVX(to_invmap_sv);
PERL_ARGS_ASSERT_DO_TRANS_COUNT_INVMAP;
s = (U8*)SvPV_nomg(sv, len);
send = s + len;
while (s < send) {
UV from;
SSize_t i;
STRLEN s_len;
/* Get the code point of the next character in the string */
if (! SvUTF8(sv) || UTF8_IS_INVARIANT(*s)) {
from = *s;
s_len = 1;
}
else {
from = utf8_to_uvchr_buf(s, send, &s_len);
if (from == 0 && *s != '\0') {
_force_out_malformed_utf8_message(s, send, 0, /*die*/TRUE);
}
}
/* Look the code point up in the data structure for this tr/// to get
* what it maps to */
i = _invlist_search(from_invlist, from);
assert(i >= 0);
if (map[i] != (UV) TR_UNLISTED) {
matches++;
}
s += s_len;
}
return matches;
}
/* Helper function for do_trans().
* Handles cases where an inversion map implementation is to be used and the
* search and replacement charlists are either not identical or flags are
* present.
*
* sv may or may not be utf8.
*/
STATIC Size_t
S_do_trans_invmap(pTHX_ SV * const sv, AV * const invmap)
{
U8 *s;
U8 *send;
U8 *d;
U8 *start;
U8 *dstart, *dend;
U8 *s0;
U8 *d0;
Size_t matches = 0;
const bool grows = cBOOL(PL_op->op_private & OPpTRANS_GROWS);
STRLEN len;
SV* const rv =
#ifdef USE_ITHREADS
PAD_SVl(cPADOP->op_padix);
#else
MUTABLE_SV(cSVOP->op_sv);
#endif
HV* const hv = MUTABLE_HV(SvRV(rv));
SV* const * svp = hv_fetchs(hv, "NONE", FALSE);
const UV none = svp ? SvUV(*svp) : 0x7fffffff;
const UV extra = none + 1;
UV final = 0;
U8 hibit = 0;
SV** const from_invlist_ptr = av_fetch(invmap, 0, TRUE);
SV** const to_invmap_ptr = av_fetch(invmap, 1, TRUE);
SV** const to_expansion_ptr = av_fetch(invmap, 2, TRUE);
NV max_expansion = SvNV(*to_expansion_ptr);
SV* from_invlist = *from_invlist_ptr;
SV* to_invmap_sv = *to_invmap_ptr;
UV* map = (UV *) SvPVX(to_invmap_sv);
UV previous_map = TR_OOB;
const bool squash = cBOOL(PL_op->op_private & OPpTRANS_SQUASH);
const bool delete_unfound = cBOOL(PL_op->op_private & OPpTRANS_DELETE);
bool inplace = ! cBOOL(PL_op->op_private & OPpTRANS_GROWS);
const UV* from_array = invlist_array(from_invlist);
UV final_map;
bool out_is_utf8 = SvUTF8(sv);
STRLEN s_len;
PERL_ARGS_ASSERT_DO_TRANS_SIMPLE_UTF8;
PERL_ARGS_ASSERT_DO_TRANS_INVMAP;
/* A third element in the array indicates that the replacement list was
* shorter than the search list, and this element contains the value to use
* for the items that don't correspond */
if (av_top_index(invmap) >= 3) {
SV** const final_map_ptr = av_fetch(invmap, 3, TRUE);
SV* const final_map_sv = *final_map_ptr;
final_map = SvUV(final_map_sv);
}
/* If there is something in the transliteration that could force the input
* to be changed to UTF-8, we don't know if we can do it in place, so
* assume cannot */
if (! out_is_utf8 && (PL_op->op_private & OPpTRANS_CAN_FORCE_UTF8)) {
inplace = FALSE;
if (max_expansion < 2) {
max_expansion = 2;
}
}
s = (U8*)SvPV_nomg(sv, len);
if (!SvUTF8(sv)) {
hibit = ! is_utf8_invariant_string(s, len);
if (hibit) {
s = bytes_to_utf8(s, &len);
}
}
send = s + len;
start = s;
s0 = s;
svp = hv_fetchs(hv, "FINAL", FALSE);
if (svp)
final = SvUV(*svp);
if (grows) {
/* d needs to be bigger than s, in case e.g. upgrading is required */
Newx(d, len * 3 + UTF8_MAXBYTES, U8);
dend = d + len * 3;
dstart = d;
/* We know by now if there are some possible input strings whose
* transliterations are longer than the input. If none can, we just edit
* in place. */
if (inplace) {
d0 = d = s;
}
else {
dstart = d = s;
dend = d + len;
/* Here, we can't edit in place. We have no idea how much, if any,
* this particular input string will grow. However, the compilation
* calculated the maximum expansion possible. Use that to allocale
* based on the worst case scenario. */
Newx(d, len * max_expansion + 1, U8);
d0 = d;
}
restart:
/* Do the actual transliteration */
while (s < send) {
const UV uv = swash_fetch(rv, s, TRUE);
if (uv < none) {
s += UTF8SKIP(s);
matches++;
d = uvchr_to_utf8(d, uv);
}
else if (uv == none) {
const int i = UTF8SKIP(s);
Move(s, d, i, U8);
d += i;
s += i;
}
else if (uv == extra) {
s += UTF8SKIP(s);
matches++;
d = uvchr_to_utf8(d, final);
}
else
s += UTF8SKIP(s);
UV from;
UV to;
SSize_t i;
STRLEN s_len;
if (d > dend) {
const STRLEN clen = d - dstart;
const STRLEN nlen = dend - dstart + len + UTF8_MAXBYTES;
if (!grows)
Perl_croak(aTHX_ "panic: do_trans_simple_utf8 line %d",__LINE__);
Renew(dstart, nlen + UTF8_MAXBYTES, U8);
d = dstart + clen;
dend = dstart + nlen;
/* Get the code point of the next character in the string */
if (! SvUTF8(sv) || UTF8_IS_INVARIANT(*s)) {
from = *s;
s_len = 1;
}
else {
from = utf8_to_uvchr_buf(s, send, &s_len);
if (from == 0 && *s != '\0') {
_force_out_malformed_utf8_message(s, send, 0, /*die*/TRUE);
}
}
/* Look the code point up in the data structure for this tr/// to get
* what it maps to */
i = _invlist_search(from_invlist, from);
assert(i >= 0);
to = map[i];
if (to == (UV) TR_UNLISTED) { /* Just copy the unreplaced character */
if (UVCHR_IS_INVARIANT(from) || ! out_is_utf8) {
*d++ = from;
}
else if (SvUTF8(sv)) {
Move(s, d, s_len, U8);
d += s_len;
}
else { /* Convert to UTF-8 */
append_utf8_from_native_byte(*s, &d);
}
previous_map = to;
s += s_len;
continue;
}
/* Everything else is counted as a match */
matches++;
if (to == (UV) TR_SPECIAL_HANDLING) {
if (delete_unfound) {
previous_map = to;
s += s_len;
continue;
}
/* Use the final character in the replacement list */
to = final_map;
}
else { /* Here the input code point is to be remapped. The actual
value is offset from the base of this entry */
to += from - from_array[i];
}
/* If copying all occurrences, or this is the first occurrence, copy it
* to the output */
if (! squash || to != previous_map) {
if (out_is_utf8) {
d = uvchr_to_utf8(d, to);
}
else {
if (to >= 256) { /* If need to convert to UTF-8, restart */
out_is_utf8 = TRUE;
s = s0;
d = d0;
matches = 0;
goto restart;
}
*d++ = to;
}
}
previous_map = to;
s += s_len;
}
if (grows || hibit) {
sv_setpvn(sv, (char*)dstart, d - dstart);
Safefree(dstart);
if (grows && hibit)
Safefree(start);
s_len = 0;
s += s_len;
if (! inplace) {
sv_setpvn(sv, (char*)d0, d - d0);
}
else {
*d = '\0';
SvCUR_set(sv, d - dstart);
}
SvSETMAGIC(sv);
SvUTF8_on(sv);
return matches;
}
/* Helper function for do_trans().
* Handles utf8 cases(*) where search and replacement charlists are
* identical: so the string isn't modified, and only a count of modifiable
* chars is needed.
* Note that it doesn't handle /d or /s, since these modify the string
* even if the replacement charlist is empty.
* (*) i.e. where the search or replacement charlists are utf8. sv may
* or may not be utf8.
*/
STATIC Size_t
S_do_trans_count_utf8(pTHX_ SV * const sv)
{
const U8 *s;
const U8 *start = NULL;
const U8 *send;
Size_t matches = 0;
STRLEN len;
SV* const rv =
#ifdef USE_ITHREADS
PAD_SVl(cPADOP->op_padix);
#else
MUTABLE_SV(cSVOP->op_sv);
#endif
HV* const hv = MUTABLE_HV(SvRV(rv));
SV* const * const svp = hv_fetchs(hv, "NONE", FALSE);
const UV none = svp ? SvUV(*svp) : 0x7fffffff;
const UV extra = none + 1;
U8 hibit = 0;
PERL_ARGS_ASSERT_DO_TRANS_COUNT_UTF8;
s = (const U8*)SvPV_nomg_const(sv, len);
if (!SvUTF8(sv)) {
hibit = ! is_utf8_invariant_string(s, len);
if (hibit) {
start = s = bytes_to_utf8(s, &len);
}
}
send = s + len;
while (s < send) {
const UV uv = swash_fetch(rv, s, TRUE);
if (uv < none || uv == extra)
matches++;
s += UTF8SKIP(s);
}
if (hibit)
Safefree(start);
return matches;
}
/* Helper function for do_trans().
* Handles utf8 cases(*) involving the /c, /d, /s flags,
* and where search and replacement charlists aren't identical.
* (*) i.e. where the search or replacement charlists are utf8. sv may
* or may not be utf8.
*/
STATIC Size_t
S_do_trans_complex_utf8(pTHX_ SV * const sv)
{
U8 *start, *send;
U8 *d;
Size_t matches = 0;
const bool squash = cBOOL(PL_op->op_private & OPpTRANS_SQUASH);
const bool del = cBOOL(PL_op->op_private & OPpTRANS_DELETE);
const bool grows = cBOOL(PL_op->op_private & OPpTRANS_GROWS);
SV* const rv =
#ifdef USE_ITHREADS
PAD_SVl(cPADOP->op_padix);
#else
MUTABLE_SV(cSVOP->op_sv);
#endif
HV * const hv = MUTABLE_HV(SvRV(rv));
SV * const *svp = hv_fetchs(hv, "NONE", FALSE);
const UV none = svp ? SvUV(*svp) : 0x7fffffff;
const UV extra = none + 1;
UV final = 0;
bool havefinal = FALSE;
STRLEN len;
U8 *dstart, *dend;
U8 hibit = 0;
U8 *s = (U8*)SvPV_nomg(sv, len);
PERL_ARGS_ASSERT_DO_TRANS_COMPLEX_UTF8;
if (!SvUTF8(sv)) {
hibit = ! is_utf8_invariant_string(s, len);
if (hibit) {
s = bytes_to_utf8(s, &len);
}
}
send = s + len;
start = s;
svp = hv_fetchs(hv, "FINAL", FALSE);
if (svp) {
final = SvUV(*svp);
havefinal = TRUE;
SvCUR_set(sv, d - d0);
}
if (grows) {
/* d needs to be bigger than s, in case e.g. upgrading is required */
Newx(d, len * 3 + UTF8_MAXBYTES, U8);
dend = d + len * 3;
dstart = d;
if (! SvUTF8(sv) && out_is_utf8) {
SvUTF8_on(sv);
}
else {
dstart = d = s;
dend = d + len;
}
if (squash) {
UV puv = 0xfeedface;
while (s < send) {
UV uv = swash_fetch(rv, s, TRUE);
if (d > dend) {
const STRLEN clen = d - dstart;
const STRLEN nlen = dend - dstart + len + UTF8_MAXBYTES;
if (!grows)
Perl_croak(aTHX_ "panic: do_trans_complex_utf8 line %d",__LINE__);
Renew(dstart, nlen + UTF8_MAXBYTES, U8);
d = dstart + clen;
dend = dstart + nlen;
}
if (uv < none) {
matches++;
s += UTF8SKIP(s);
if (uv != puv) {
d = uvchr_to_utf8(d, uv);
puv = uv;
}
continue;
}
else if (uv == none) { /* "none" is unmapped character */
const int i = UTF8SKIP(s);
Move(s, d, i, U8);
d += i;
s += i;
puv = 0xfeedface;
continue;
}
else if (uv == extra && !del) {
matches++;
if (havefinal) {
s += UTF8SKIP(s);
if (puv != final) {
d = uvchr_to_utf8(d, final);
puv = final;
}
}
else {
STRLEN len;
uv = utf8n_to_uvchr(s, send - s, &len, UTF8_ALLOW_DEFAULT);
if (uv != puv) {
Move(s, d, len, U8);
d += len;
puv = uv;
}
s += len;
}
continue;
}
matches++; /* "none+1" is delete character */
s += UTF8SKIP(s);
}
}
else {
while (s < send) {
const UV uv = swash_fetch(rv, s, TRUE);
if (d > dend) {
const STRLEN clen = d - dstart;
const STRLEN nlen = dend - dstart + len + UTF8_MAXBYTES;
if (!grows)
Perl_croak(aTHX_ "panic: do_trans_complex_utf8 line %d",__LINE__);
Renew(dstart, nlen + UTF8_MAXBYTES, U8);
d = dstart + clen;
dend = dstart + nlen;
}
if (uv < none) {
matches++;
s += UTF8SKIP(s);
d = uvchr_to_utf8(d, uv);
continue;
}
else if (uv == none) { /* "none" is unmapped character */
const int i = UTF8SKIP(s);
Move(s, d, i, U8);
d += i;
s += i;
continue;
}
else if (uv == extra && !del) {
matches++;
s += UTF8SKIP(s);
d = uvchr_to_utf8(d, final);
continue;
}
matches++; /* "none+1" is delete character */
s += UTF8SKIP(s);
}
}
if (grows || hibit) {
sv_setpvn(sv, (char*)dstart, d - dstart);
Safefree(dstart);
if (grows && hibit)
Safefree(start);
}
else {
*d = '\0';
SvCUR_set(sv, d - dstart);
}
SvUTF8_on(sv);
SvSETMAGIC(sv);
return matches;
@ -627,7 +540,8 @@ S_do_trans_complex_utf8(pTHX_ SV * const sv)
/* Execute a tr//. sv is the value to be translated, while PL_op
* should be an OP_TRANS or OP_TRANSR op, whose op_pv field contains a
* translation table or whose op_sv field contains a swash.
* translation table or whose op_sv field contains an inversion map.
*
* Returns a count of number of characters translated
*/
@ -636,31 +550,49 @@ Perl_do_trans(pTHX_ SV *sv)
{
STRLEN len;
const U8 flags = PL_op->op_private;
const U8 hasutf = flags & (OPpTRANS_FROM_UTF | OPpTRANS_TO_UTF);
bool use_utf8_fcns = cBOOL(flags & OPpTRANS_USE_SVOP);
bool identical = cBOOL(flags & OPpTRANS_IDENTICAL);
PERL_ARGS_ASSERT_DO_TRANS;
if (SvREADONLY(sv) && !(flags & OPpTRANS_IDENTICAL)) {
if (SvREADONLY(sv) && ! identical) {
Perl_croak_no_modify();
}
(void)SvPV_const(sv, len);
if (!len)
return 0;
if (!(flags & OPpTRANS_IDENTICAL)) {
if (! identical) {
if (!SvPOKp(sv) || SvTHINKFIRST(sv))
(void)SvPV_force_nomg(sv, len);
(void)SvPOK_only_UTF8(sv);
}
/* If we use only OPpTRANS_IDENTICAL to bypass the READONLY check,
* we must also rely on it to choose the readonly strategy.
*/
if (flags & OPpTRANS_IDENTICAL) {
return hasutf ? do_trans_count_utf8(sv) : do_trans_count(sv, (OPtrans_map*)cPVOP->op_pv);
} else if (flags & (OPpTRANS_SQUASH|OPpTRANS_DELETE|OPpTRANS_COMPLEMENT)) {
return hasutf ? do_trans_complex_utf8(sv) : do_trans_complex(sv, (OPtrans_map*)cPVOP->op_pv);
} else {
return hasutf ? do_trans_simple_utf8(sv) : do_trans_simple(sv, (OPtrans_map*)cPVOP->op_pv);
if (use_utf8_fcns) {
SV* const map =
#ifdef USE_ITHREADS
PAD_SVl(cPADOP->op_padix);
#else
MUTABLE_SV(cSVOP->op_sv);
#endif
if (identical) {
return do_trans_count_invmap(sv, (AV *) map);
}
else {
return do_trans_invmap(sv, (AV *) map);
}
}
else {
const OPtrans_map * const map = (OPtrans_map*)cPVOP->op_pv;
if (identical) {
return do_trans_count(sv, map);
}
else if (flags & (OPpTRANS_SQUASH|OPpTRANS_DELETE|OPpTRANS_COMPLEMENT)) {
return do_trans_complex(sv, map);
}
else
return do_trans_simple(sv, map);
}
}

13
dump.c
View File

@ -1305,13 +1305,13 @@ S_do_op_dump_bar(pTHX_ I32 level, UV bar, PerlIO *file, const OP *o)
case OP_TRANS:
case OP_TRANSR:
if (o->op_private & (OPpTRANS_FROM_UTF | OPpTRANS_TO_UTF)) {
/* utf8: table stored as a swash */
if (o->op_private & OPpTRANS_USE_SVOP) {
/* utf8: table stored as an inversion map */
#ifndef USE_ITHREADS
/* with ITHREADS, swash is stored in the pad, and the right pad
/* with ITHREADS, it is stored in the pad, and the right pad
* may not be active here, so skip */
S_opdump_indent(aTHX_ o, level, bar, file,
"SWASH = 0x%" UVxf "\n",
"INVMAP = 0x%" UVxf "\n",
PTR2UV(MUTABLE_SV(cSVOPo->op_sv)));
#endif
}
@ -2986,11 +2986,10 @@ Perl_op_class(pTHX_ const OP *o)
* pointer to a table of shorts used to look up translations.
* Under utf8, however, a simple table isn't practical; instead,
* the OP is an SVOP (or, under threads, a PADOP),
* and the SV is a reference to a swash
* (i.e., an RV pointing to an HV).
* and the SV is an AV.
*/
return (!custom &&
(o->op_private & (OPpTRANS_TO_UTF|OPpTRANS_FROM_UTF))
(o->op_private & OPpTRANS_USE_SVOP)
)
#if defined(USE_ITHREADS)
? OPclass_PADOP : OPclass_PVOP;

View File

@ -1870,7 +1870,7 @@ Apd |void |sv_vsetpvfn |NN SV *const sv|NN const char *const pat|const STRLEN pa
ApR |NV |str_to_version |NN SV *sv
EXpR |SV* |swash_init |NN const char* pkg|NN const char* name|NN SV* listsv|I32 minbits|I32 none
EXp |UV |swash_fetch |NN SV *swash|NN const U8 *ptr|bool do_utf8
#if defined(PERL_IN_REGCOMP_C)
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
EiR |SV* |add_cp_to_invlist |NULLOK SV* invlist|const UV cp
Ei |void |invlist_extend |NN SV* const invlist|const UV len
Ei |void |invlist_set_len|NN SV* const invlist|const UV len|const bool offset
@ -1922,7 +1922,8 @@ EpX |SV* |invlist_clone |NN SV* const invlist|NULLOK SV* newlist
#endif
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) \
|| defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) \
|| defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C)
|| defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C) \
|| defined(PERL_IN_DOOP_C)
EiRT |UV* |invlist_array |NN SV* const invlist
EiRT |bool |is_invlist |NULLOK SV* const invlist
EiRT |bool* |get_invlist_offset_addr|NN SV* invlist
@ -2308,9 +2309,8 @@ p |void |init_constants
SR |Size_t |do_trans_simple |NN SV * const sv|NN const OPtrans_map * const tbl
SR |Size_t |do_trans_count |NN SV * const sv|NN const OPtrans_map * const tbl
SR |Size_t |do_trans_complex |NN SV * const sv|NN const OPtrans_map * const tbl
SR |Size_t |do_trans_simple_utf8 |NN SV * const sv
SR |Size_t |do_trans_count_utf8 |NN SV * const sv
SR |Size_t |do_trans_complex_utf8 |NN SV * const sv
SR |Size_t |do_trans_invmap |NN SV * const sv|NN AV * const map
SR |Size_t |do_trans_count_invmap |NN SV * const sv|NN AV * const map
#endif
#if defined(PERL_IN_GV_C)

25
embed.h
View File

@ -1014,7 +1014,6 @@
# endif
# if defined(PERL_IN_REGCOMP_C)
#define add_above_Latin1_folds(a,b,c) S_add_above_Latin1_folds(aTHX_ a,b,c)
#define add_cp_to_invlist(a,b) S_add_cp_to_invlist(aTHX_ a,b)
#define add_data S_add_data
#define add_multi_match(a,b,c) S_add_multi_match(aTHX_ a,b,c)
#define change_engine_size(a,b) S_change_engine_size(aTHX_ a,b)
@ -1024,20 +1023,13 @@
#define edit_distance S_edit_distance
#define get_ANYOFM_contents(a) S_get_ANYOFM_contents(aTHX_ a)
#define get_ANYOF_cp_list_for_ssc(a,b) S_get_ANYOF_cp_list_for_ssc(aTHX_ a,b)
#define get_invlist_iter_addr S_get_invlist_iter_addr
#define grok_bslash_N(a,b,c,d,e,f,g) S_grok_bslash_N(aTHX_ a,b,c,d,e,f,g)
#define handle_named_backref(a,b,c,d) S_handle_named_backref(aTHX_ a,b,c,d)
#define handle_possible_posix(a,b,c,d,e) S_handle_possible_posix(aTHX_ a,b,c,d,e)
#define handle_regex_sets(a,b,c,d,e) S_handle_regex_sets(aTHX_ a,b,c,d,e)
#define handle_user_defined_property(a,b,c,d,e,f,g,h,i,j) Perl_handle_user_defined_property(aTHX_ a,b,c,d,e,f,g,h,i,j)
#define invlist_contents(a,b) S_invlist_contents(aTHX_ a,b)
#define invlist_extend(a,b) S_invlist_extend(aTHX_ a,b)
#define invlist_highest S_invlist_highest
#define invlist_is_iterating S_invlist_is_iterating
#define invlist_iterfinish S_invlist_iterfinish
#define invlist_iterinit S_invlist_iterinit
#define invlist_iternext S_invlist_iternext
#define invlist_set_len(a,b,c) S_invlist_set_len(aTHX_ a,b,c)
#define is_ssc_worth_it S_is_ssc_worth_it
#define join_exact(a,b,c,d,e,f,g) S_join_exact(aTHX_ a,b,c,d,e,f,g)
#define make_exactf_invlist(a,b) S_make_exactf_invlist(aTHX_ a,b)
@ -1083,6 +1075,16 @@
# if defined(PERL_IN_REGCOMP_C) || defined (PERL_IN_DUMP_C) || defined(PERL_IN_OP_C)
#define _invlist_dump(a,b,c,d) Perl__invlist_dump(aTHX_ a,b,c,d)
# endif
# if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
#define add_cp_to_invlist(a,b) S_add_cp_to_invlist(aTHX_ a,b)
#define get_invlist_iter_addr S_get_invlist_iter_addr
#define invlist_extend(a,b) S_invlist_extend(aTHX_ a,b)
#define invlist_highest S_invlist_highest
#define invlist_iterfinish S_invlist_iterfinish
#define invlist_iterinit S_invlist_iterinit
#define invlist_iternext S_invlist_iternext
#define invlist_set_len(a,b,c) S_invlist_set_len(aTHX_ a,b,c)
# endif
# if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || defined(PERL_IN_UTF8_C)
#define _invlistEQ(a,b,c) Perl__invlistEQ(aTHX_ a,b,c)
#define _new_invlist_C_array(a) Perl__new_invlist_C_array(aTHX_ a)
@ -1094,7 +1096,7 @@
#endif
#define regprop(a,b,c,d,e) Perl_regprop(aTHX_ a,b,c,d,e)
# endif
# if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C)
# if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
#define _invlist_contains_cp S__invlist_contains_cp
#define _invlist_len S__invlist_len
#define _invlist_search Perl__invlist_search
@ -1603,11 +1605,10 @@
# endif
# if defined(PERL_IN_DOOP_C)
#define do_trans_complex(a,b) S_do_trans_complex(aTHX_ a,b)
#define do_trans_complex_utf8(a) S_do_trans_complex_utf8(aTHX_ a)
#define do_trans_count(a,b) S_do_trans_count(aTHX_ a,b)
#define do_trans_count_utf8(a) S_do_trans_count_utf8(aTHX_ a)
#define do_trans_count_invmap(a,b) S_do_trans_count_invmap(aTHX_ a,b)
#define do_trans_invmap(a,b) S_do_trans_invmap(aTHX_ a,b)
#define do_trans_simple(a,b) S_do_trans_simple(aTHX_ a,b)
#define do_trans_simple_utf8(a) S_do_trans_simple_utf8(aTHX_ a)
# endif
# if defined(PERL_IN_DUMP_C)
#define deb_curcv(a) S_deb_curcv(aTHX_ a)

View File

@ -14,7 +14,8 @@
|| defined(PERL_IN_REGEXEC_C) \
|| defined(PERL_IN_TOKE_C) \
|| defined(PERL_IN_PP_C) \
|| defined(PERL_IN_OP_C)
|| defined(PERL_IN_OP_C) \
|| defined(PERL_IN_DOOP_C)
/* An element is in an inversion list iff its index is even numbered: 0, 2, 4,
* etc */
@ -92,7 +93,7 @@ S_invlist_array(SV* const invlist)
}
#endif
#if defined(PERL_IN_REGCOMP_C)
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
PERL_STATIC_INLINE void
S_invlist_extend(pTHX_ SV* const invlist, const UV new_max)

View File

@ -279,6 +279,8 @@ BEGIN { for (qw[ const stringify rv2sv list glob pushmark null aelem
# _pessimise_walk(): recursively walk the optree of a sub,
# possibly undoing optimisations along the way.
sub DEBUG { 0 }
sub _pessimise_walk {
my ($self, $startop) = @_;
@ -5714,100 +5716,81 @@ sub tr_chr {
}
}
# XXX This doesn't yet handle all cases correctly either
sub tr_invmap {
my ($invlist_ref, $map_ref) = @_;
my $infinity = ~0 >> 1; # IV_MAX
my $from = "";
my $to = "";
for my $i (0.. @$invlist_ref - 1) {
my $this_from = $invlist_ref->[$i];
my $map = $map_ref->[$i];
my $upper = ($i < @$invlist_ref - 1)
? $invlist_ref->[$i+1]
: $infinity;
my $range = $upper - $this_from - 1;
if (DEBUG) {
print STDERR "i=$i, from=$this_from, upper=$upper, range=$range\n";
}
next if $map == ~0;
next if $map == ~0 - 1;
$from .= tr_chr($this_from);
$to .= tr_chr($map);
next if $range == 0; # Single code point
if ($range == 1) { # Adjacent code points
$from .= tr_chr($this_from + 1);
$to .= tr_chr($map + 1);
}
elsif ($upper != $infinity) {
$from .= "-" . tr_chr($this_from + $range);
$to .= "-" . tr_chr($map + $range);
}
else {
$from .= "-INFTY";
$to .= "-INFTY";
}
}
return ($from, $to);
}
sub tr_decode_utf8 {
my($swash_hv, $flags) = @_;
my %swash = $swash_hv->ARRAY;
my $final = undef;
$final = $swash{'FINAL'}->IV if exists $swash{'FINAL'};
my $none = $swash{"NONE"}->IV;
my $extra = $none + 1;
my(@from, @delfrom, @to);
my $line;
foreach $line (split /\n/, $swash{'LIST'}->PV) {
my($min, $max, $result) = split(/\t/, $line);
$min = hex $min;
if (length $max) {
$max = hex $max;
} else {
$max = $min;
}
$result = hex $result;
if ($result == $extra) {
push @delfrom, [$min, $max];
} else {
push @from, [$min, $max];
push @to, [$result, $result + $max - $min];
}
my($tr_av, $flags) = @_;
printf STDERR "flags=0x%x\n", $flags if DEBUG;
my $invlist = $tr_av->ARRAYelt(0);
my @invlist = unpack("J*", $invlist->PV);
my @map = unpack("J*", $tr_av->ARRAYelt(1)->PV);
if (DEBUG) {
for my $i (0 .. @invlist - 1) {
printf STDERR "[%d]\t%x\t", $i, $invlist[$i];
my $map = $map[$i];
if ($map == ~0) {
print STDERR "TR_UNMAPPED\n";
}
elsif ($map == ~0 - 1) {
print STDERR "TR_SPECIAL\n";
}
else {
printf STDERR "%x\n", $map;
}
}
}
for my $i (0 .. $#from) {
if ($from[$i][0] == ord '-') {
unshift @from, splice(@from, $i, 1);
unshift @to, splice(@to, $i, 1);
last;
} elsif ($from[$i][1] == ord '-') {
$from[$i][1]--;
$to[$i][1]--;
unshift @from, ord '-';
unshift @to, ord '-';
last;
}
}
for my $i (0 .. $#delfrom) {
if ($delfrom[$i][0] == ord '-') {
push @delfrom, splice(@delfrom, $i, 1);
last;
} elsif ($delfrom[$i][1] == ord '-') {
$delfrom[$i][1]--;
push @delfrom, ord '-';
last;
}
}
if (defined $final and $to[$#to][1] != $final) {
push @to, [$final, $final];
}
push @from, @delfrom;
my ($from, $to) = tr_invmap(\@invlist, \@map);
if ($flags & OPpTRANS_COMPLEMENT) {
my @newfrom;
my $next = 0;
for my $i (0 .. $#from) {
push @newfrom, [$next, $from[$i][0] - 1];
$next = $from[$i][1] + 1;
}
@from = ();
for my $range (@newfrom) {
if ($range->[0] <= $range->[1]) {
push @from, $range;
}
}
shift @map;
pop @invlist;
my $throw_away;
($from, $throw_away) = tr_invmap(\@invlist, \@map);
}
my($from, $to, $diff);
for my $chunk (@from) {
$diff = $chunk->[1] - $chunk->[0];
if ($diff > 1) {
$from .= tr_chr($chunk->[0]) . "-" . tr_chr($chunk->[1]);
} elsif ($diff == 1) {
$from .= tr_chr($chunk->[0]) . tr_chr($chunk->[1]);
} else {
$from .= tr_chr($chunk->[0]);
}
if (DEBUG) {
print STDERR "Returning ", escape_str($from), "/",
escape_str($to), "\n";
}
for my $chunk (@to) {
$diff = $chunk->[1] - $chunk->[0];
if ($diff > 1) {
$to .= tr_chr($chunk->[0]) . "-" . tr_chr($chunk->[1]);
} elsif ($diff == 1) {
$to .= tr_chr($chunk->[0]) . tr_chr($chunk->[1]);
} else {
$to .= tr_chr($chunk->[0]);
}
}
#$final = sprintf("%04x", $final) if defined $final;
#$none = sprintf("%04x", $none) if defined $none;
#$extra = sprintf("%04x", $extra) if defined $extra;
#print STDERR "final: $final\n none: $none\nextra: $extra\n";
#print STDERR $swash{'LIST'}->PV;
return (escape_str($from), escape_str($to));
}
@ -5821,9 +5804,9 @@ sub pp_trans {
($from, $to) = tr_decode_byte($op->pv, $priv_flags);
} elsif ($class eq "PADOP") {
($from, $to)
= tr_decode_utf8($self->padval($op->padix)->RV, $priv_flags);
= tr_decode_utf8($self->padval($op->padix), $priv_flags);
} else { # class($op) eq "SVOP"
($from, $to) = tr_decode_utf8($op->sv->RV, $priv_flags);
($from, $to) = tr_decode_utf8($op->sv, $priv_flags);
}
my $flags = "";
$flags .= "c" if $priv_flags & OPpTRANS_COMPLEMENT;

1332
op.c

File diff suppressed because it is too large Load Diff

2
op.h
View File

@ -1110,7 +1110,7 @@ C<sib> is non-null. For a higher-level interface, see C<L</op_sibling_splice>>.
"Use of strings with code points over 0xFF as arguments to " \
"%s operator is not allowed"
#endif
#if defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
#if defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C) || defined(PERL_IN_PERL_C)
# define TR_UNMAPPED (UV)-1
# define TR_DELETE (UV)-2
# define TR_R_EMPTY (UV)-3 /* rhs (replacement) is empty */

119
proto.h
View File

@ -4859,31 +4859,26 @@ STATIC Size_t S_do_trans_complex(pTHX_ SV * const sv, const OPtrans_map * const
#define PERL_ARGS_ASSERT_DO_TRANS_COMPLEX \
assert(sv); assert(tbl)
STATIC Size_t S_do_trans_complex_utf8(pTHX_ SV * const sv)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_COMPLEX_UTF8 \
assert(sv)
STATIC Size_t S_do_trans_count(pTHX_ SV * const sv, const OPtrans_map * const tbl)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_COUNT \
assert(sv); assert(tbl)
STATIC Size_t S_do_trans_count_utf8(pTHX_ SV * const sv)
STATIC Size_t S_do_trans_count_invmap(pTHX_ SV * const sv, AV * const map)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_COUNT_UTF8 \
assert(sv)
#define PERL_ARGS_ASSERT_DO_TRANS_COUNT_INVMAP \
assert(sv); assert(map)
STATIC Size_t S_do_trans_invmap(pTHX_ SV * const sv, AV * const map)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_INVMAP \
assert(sv); assert(map)
STATIC Size_t S_do_trans_simple(pTHX_ SV * const sv, const OPtrans_map * const tbl)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_SIMPLE \
assert(sv); assert(tbl)
STATIC Size_t S_do_trans_simple_utf8(pTHX_ SV * const sv)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_DO_TRANS_SIMPLE_UTF8 \
assert(sv)
#endif
#if defined(PERL_IN_DUMP_C)
STATIC CV* S_deb_curcv(pTHX_ I32 ix);
@ -5540,12 +5535,6 @@ STATIC SV * S_space_join_names_mortal(pTHX_ char *const *array);
STATIC void S_add_above_Latin1_folds(pTHX_ RExC_state_t *pRExC_state, const U8 cp, SV** invlist);
#define PERL_ARGS_ASSERT_ADD_ABOVE_LATIN1_FOLDS \
assert(pRExC_state); assert(invlist)
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE SV* S_add_cp_to_invlist(pTHX_ SV* invlist, const UV cp)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_ADD_CP_TO_INVLIST
#endif
STATIC U32 S_add_data(RExC_state_t* const pRExC_state, const char* const s, const U32 n)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_ADD_DATA \
@ -5582,13 +5571,6 @@ STATIC SV * S_get_ANYOFM_contents(pTHX_ const regnode * n)
STATIC SV* S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t *pRExC_state, const regnode_charclass* const node);
#define PERL_ARGS_ASSERT_GET_ANYOF_CP_LIST_FOR_SSC \
assert(pRExC_state); assert(node)
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE STRLEN* S_get_invlist_iter_addr(SV* invlist)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_GET_INVLIST_ITER_ADDR \
assert(invlist)
#endif
STATIC bool S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state, regnode_offset* nodep, UV *code_point_p, int* cp_count, I32 *flagp, const bool strict, const U32 depth);
#define PERL_ARGS_ASSERT_GROK_BSLASH_N \
assert(pRExC_state); assert(flagp)
@ -5613,18 +5595,6 @@ PERL_STATIC_INLINE SV* S_invlist_contents(pTHX_ SV* const invlist, const bool tr
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_extend(pTHX_ SV* const invlist, const UV len);
#define PERL_ARGS_ASSERT_INVLIST_EXTEND \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE UV S_invlist_highest(SV* const invlist)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_INVLIST_HIGHEST \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE bool S_invlist_is_iterating(SV* const invlist)
__attribute__warn_unused_result__;
@ -5632,28 +5602,6 @@ PERL_STATIC_INLINE bool S_invlist_is_iterating(SV* const invlist)
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_iterfinish(SV* invlist);
#define PERL_ARGS_ASSERT_INVLIST_ITERFINISH \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_iterinit(SV* invlist);
#define PERL_ARGS_ASSERT_INVLIST_ITERINIT \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE bool S_invlist_iternext(SV* invlist, UV* start, UV* end)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_INVLIST_ITERNEXT \
assert(invlist); assert(start); assert(end)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_set_len(pTHX_ SV* const invlist, const UV len, const bool offset);
#define PERL_ARGS_ASSERT_INVLIST_SET_LEN \
assert(invlist)
#endif
STATIC bool S_is_ssc_worth_it(const RExC_state_t * pRExC_state, const regnode_ssc * ssc);
#define PERL_ARGS_ASSERT_IS_SSC_WORTH_IT \
assert(pRExC_state); assert(ssc)
@ -5811,6 +5759,55 @@ PERL_CALLCONV void Perl__invlist_dump(pTHX_ PerlIO *file, I32 level, const char*
#define PERL_ARGS_ASSERT__INVLIST_DUMP \
assert(file); assert(indent); assert(invlist)
#endif
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE SV* S_add_cp_to_invlist(pTHX_ SV* invlist, const UV cp)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_ADD_CP_TO_INVLIST
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE STRLEN* S_get_invlist_iter_addr(SV* invlist)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_GET_INVLIST_ITER_ADDR \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_extend(pTHX_ SV* const invlist, const UV len);
#define PERL_ARGS_ASSERT_INVLIST_EXTEND \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE UV S_invlist_highest(SV* const invlist)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_INVLIST_HIGHEST \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_iterfinish(SV* invlist);
#define PERL_ARGS_ASSERT_INVLIST_ITERFINISH \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_iterinit(SV* invlist);
#define PERL_ARGS_ASSERT_INVLIST_ITERINIT \
assert(invlist)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE bool S_invlist_iternext(SV* invlist, UV* start, UV* end)
__attribute__warn_unused_result__;
#define PERL_ARGS_ASSERT_INVLIST_ITERNEXT \
assert(invlist); assert(start); assert(end)
#endif
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE void S_invlist_set_len(pTHX_ SV* const invlist, const UV len, const bool offset);
#define PERL_ARGS_ASSERT_INVLIST_SET_LEN \
assert(invlist)
#endif
#endif
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || defined(PERL_IN_UTF8_C)
PERL_CALLCONV bool Perl__invlistEQ(pTHX_ SV* const a, SV* const b, const bool complement_b);
#define PERL_ARGS_ASSERT__INVLISTEQ \
@ -5832,7 +5829,7 @@ PERL_CALLCONV void Perl_regprop(pTHX_ const regexp *prog, SV* sv, const regnode*
#define PERL_ARGS_ASSERT_REGPROP \
assert(sv); assert(o)
#endif
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C)
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C) || defined(PERL_IN_OP_C) || defined(PERL_IN_DOOP_C)
#ifndef PERL_NO_INLINE_FUNCTIONS
PERL_STATIC_INLINE bool S__invlist_contains_cp(SV* const invlist, const UV cp)
__attribute__warn_unused_result__;

22
toke.c
View File

@ -2905,12 +2905,12 @@ S_scan_const(pTHX_ char *start)
bool dorange = FALSE; /* are we in a translit range? */
bool didrange = FALSE; /* did we just finish a range? */
bool in_charclass = FALSE; /* within /[...]/ */
bool d_is_utf8 = FALSE; /* Output constant is UTF8 */
bool s_is_utf8 = cBOOL(UTF); /* Is the source string assumed to be
UTF8? But, this can show as true
when the source isn't utf8, as for
example when it is entirely composed
of hex constants */
bool d_is_utf8 = FALSE; /* Output constant is UTF8 */
STRLEN utf8_variant_count = 0; /* When not in UTF-8, this counts the
number of characters found so far
that will expand (into 2 bytes)
@ -2951,11 +2951,6 @@ S_scan_const(pTHX_ char *start)
PERL_ARGS_ASSERT_SCAN_CONST;
assert(PL_lex_inwhat != OP_TRANSR);
if (PL_lex_inwhat == OP_TRANS && PL_parser->lex_sub_op) {
/* If we are doing a trans and we know we want UTF8, set expectation */
d_is_utf8 = PL_parser->lex_sub_op->op_private & (OPpTRANS_FROM_UTF|OPpTRANS_TO_UTF);
s_is_utf8 = PL_parser->lex_sub_op->op_private & (PL_lex_repl ? OPpTRANS_FROM_UTF : OPpTRANS_TO_UTF);
}
/* Protect sv from errors and fatal warnings. */
ENTER_with_name("scan_const");
@ -3646,13 +3641,6 @@ S_scan_const(pTHX_ char *start)
}
d = (char*)uvchr_to_utf8((U8*)d, uv);
if (PL_lex_inwhat == OP_TRANS
&& PL_parser->lex_sub_op)
{
PL_parser->lex_sub_op->op_private |=
(PL_lex_repl ? OPpTRANS_FROM_UTF
: OPpTRANS_TO_UTF);
}
}
}
#ifdef EBCDIC
@ -4133,10 +4121,6 @@ S_scan_const(pTHX_ char *start)
SvPOK_on(sv);
if (d_is_utf8) {
SvUTF8_on(sv);
if (PL_lex_inwhat == OP_TRANS && PL_parser->lex_sub_op) {
PL_parser->lex_sub_op->op_private |=
(PL_lex_repl ? OPpTRANS_FROM_UTF : OPpTRANS_TO_UTF);
}
}
/* shrink the sv if we allocated more than we used */
@ -10297,9 +10281,7 @@ S_scan_trans(pTHX_ char *start)
o = newPVOP(nondestruct ? OP_TRANSR : OP_TRANS, 0, (char*)NULL);
o->op_private &= ~OPpTRANS_ALL;
o->op_private |= del|squash|complement|
(DO_UTF8(PL_lex_stuff)? OPpTRANS_FROM_UTF : 0)|
(DO_UTF8(PL_parser->lex_sub_repl) ? OPpTRANS_TO_UTF : 0);
o->op_private |= del|squash|complement;
PL_lex_op = o;
pl_yylval.ival = nondestruct ? OP_TRANSR : OP_TRANS;