Karl Williamson f34acfecc2 Reimplement tr/// without swashes
This large commit removes the last use of swashes from core.

It replaces swashes by inversion maps.  This data structure is already
in use for some Unicode properties, such as case changing.

The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one.  A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions.  This should be faster than the previous implementation
anyway.  I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges.  These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.

Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster.  This means that there is extra code,
purely for runtime performance.

An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255.  Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it.  That could be
changed if I am wrong.

Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all.  This fixes some false
negatives in the previous implementation.  It also allows for detecting
if the actual transliteration can be done in place.  Previously, the
code mostly punted on that detection for the UTF-8 case.

This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.

A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8.  If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts.  The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used.  That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile.  AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.

Inversion maps are ideally suited for tr/// implementations.  An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data.  However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.

Inversion maps are also easier to deparse than swashes.  A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.

One could implement specialized data structures for specific types of
inputs.  For example, a common tr/// form is a single range, like
tr/A-Z/a-z/.  That could be implemented without a table and be quite
fast.  An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.

Thanks to Nicholas Rochemagne for his help on B
2019-11-06 21:22:24 -07:00
2019-10-20 16:11:56 +01:00
2019-10-20 16:20:30 +01:00
2019-10-25 10:10:18 +02:00
2019-11-06 21:22:24 -07:00
2019-10-20 16:11:56 +01:00
2019-10-20 16:11:56 +01:00
2019-11-05 09:47:52 -07:00
2019-11-06 21:22:24 -07:00
2019-11-06 21:22:24 -07:00
2019-10-19 11:01:27 +02:00
2019-11-05 11:27:12 -06:00
2019-06-27 09:36:35 -06:00
2019-09-02 21:52:11 -06:00
2019-10-30 20:09:37 +01:00
2019-09-15 10:39:56 -06:00
2019-11-06 21:22:24 -07:00
2019-06-18 13:23:46 +01:00
2019-11-06 21:22:24 -07:00
2019-11-06 21:22:24 -07:00
2019-11-06 21:22:24 -07:00
2019-06-18 13:23:46 +01:00
2019-11-06 00:32:06 +01:00
2019-10-17 08:03:11 +01:00
2019-10-20 16:11:56 +01:00
2019-09-05 15:37:30 +10:00
2019-10-17 08:03:11 +01:00
2019-10-20 15:55:03 +01:00
2019-10-30 20:09:37 +01:00
2019-10-17 08:03:11 +01:00
2019-11-06 21:22:24 -07:00
2019-11-06 21:22:24 -07:00
2019-11-06 21:22:24 -07:00
2019-10-20 16:11:56 +01:00
2019-06-25 09:07:12 -06:00
2019-04-29 15:51:51 -06:00
2019-10-30 20:09:37 +01:00
2019-10-30 15:11:58 +11:00
2019-11-06 21:22:24 -07:00
2019-10-25 09:53:58 +02:00
2019-10-13 09:44:11 +02:00
2019-09-09 10:55:05 -06:00
2019-10-20 16:11:56 +01:00
2019-10-24 05:54:27 -07:00
2019-09-08 22:45:44 -07:00
2019-10-20 16:11:56 +01:00
2019-10-20 16:11:56 +01:00
2019-10-20 16:11:56 +01:00
2019-09-08 22:45:44 -07:00
2019-10-20 16:11:56 +01:00
2019-10-17 08:03:11 +01:00
2019-10-03 13:34:43 +01:00
2019-10-29 21:21:36 -06:00
2019-10-29 21:21:36 -06:00
2019-10-29 21:21:36 -06:00
2019-10-29 21:21:36 -06:00
2019-10-30 20:09:37 +01:00
2019-10-20 17:50:00 +02:00
2019-09-15 10:39:56 -06:00
2019-05-24 17:09:30 -06:00
2019-11-06 21:22:24 -07:00
2019-06-25 09:07:12 -06:00
2019-10-09 10:34:18 -06:00
2019-10-17 08:03:11 +01:00
2019-09-02 21:52:10 -06:00

Perl is Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019 by Larry Wall and others.
All rights reserved.



ABOUT PERL
==========

Perl is a general-purpose programming language originally developed for
text manipulation and now used for a wide range of tasks including
system administration, web development, network programming, GUI
development, and more.

The language is intended to be practical (easy to use, efficient,
complete) rather than beautiful (tiny, elegant, minimal).  Its major
features are that it's easy to use, supports both procedural and
object-oriented (OO) programming, has powerful built-in support for text
processing, and has one of the world's most impressive collections of
third-party modules.

For an introduction to the language's features, see pod/perlintro.pod.

For a discussion of the important changes in this release, see
pod/perldelta.pod.

There are also many Perl books available, covering a wide variety of topics,
from various publishers.  See pod/perlbook.pod for more information.


INSTALLATION
============

If you're using a relatively modern operating system and want to
install this version of Perl locally, run the following commands:

  ./Configure -des -Dprefix=$HOME/localperl
  make test
  make install

This will configure and compile perl for your platform, run the regression
tests, and install perl in a subdirectory "localperl" of your home directory.

If you run into any trouble whatsoever or you need to install a customized
version of Perl, you should read the detailed instructions in the "INSTALL"
file that came with this distribution.  Additionally, there are a number of
"README" files with hints and tips about building and using Perl on a wide
variety of platforms, some more common than others.

Once you have Perl installed, a wealth of documentation is available to you
through the 'perldoc' tool.  To get started, run this command:

  perldoc perl


IF YOU RUN INTO TROUBLE
=======================

Perl is a large and complex system that's used for everything from
knitting to rocket science.  If you run into trouble, it's quite
likely that someone else has already solved the problem you're
facing. Once you've exhausted the documentation, please report bugs to us
using the 'perlbug' tool. For more information about perlbug, either type
'perldoc perlbug' or just 'perlbug' on a line by itself.

While it was current when we made it available, Perl is constantly evolving
and there may be a more recent version that fixes bugs you've run into or
adds new features that you might find useful.

You can always find the latest version of perl on a CPAN (Comprehensive Perl
Archive Network) site near you at https://www.cpan.org/src/

If you want to submit a simple patch to the perl source, see the "SUPER
QUICK PATCH GUIDE" in pod/perlhack.pod.

Just a personal note:  I want you to know that I create nice things like this
because it pleases the Author of my story.  If this bothers you, then your
notion of Authorship needs some revision.  But you can use perl anyway. :-)

							The author.


LICENSING
=========

This program is free software; you can redistribute it and/or modify
it under the terms of either:

	a) the GNU General Public License as published by the Free
	Software Foundation; either version 1, or (at your option) any
	later version, or

	b) the "Artistic License" which comes with this Kit.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See either
the GNU General Public License or the Artistic License for more details.

You should have received a copy of the Artistic License with this
Kit, in the file named "Artistic".  If not, I'll be glad to provide one.

You should also have received a copy of the GNU General Public License
along with this program in the file named "Copying". If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA or visit their web page on the internet at
https://www.gnu.org/copyleft/gpl.html.

For those of you that choose to use the GNU General Public License,
my interpretation of the GNU General Public License is that no Perl
script falls under the terms of the GPL unless you explicitly put
said script under the terms of the GPL yourself.  Furthermore, any
object code linked with perl does not automatically fall under the
terms of the GPL, provided such object code only adds definitions
of subroutines and variables, and does not otherwise impair the
resulting interpreter from executing any standard Perl script.  I
consider linking in C subroutines in this manner to be the moral
equivalent of defining subroutines in the Perl language itself.  You
may sell such an object file as proprietary provided that you provide
or offer to provide the Perl source, as specified by the GNU General
Public License.  (This is merely an alternate way of specifying input
to the program.)  You may also sell a binary produced by the dumping of
a running Perl script that belongs to you, provided that you provide or
offer to provide the Perl source as specified by the GPL.  (The
fact that a Perl interpreter and your code are in the same binary file
is, in this case, a form of mere aggregation.)  This is my interpretation
of the GPL.  If you still have concerns or difficulties understanding
my intent, feel free to contact me.  Of course, the Artistic License
spells all this out for your protection, so you may prefer to use that.


Description
🐪 The Perl programming language
Readme 506 MiB
Languages
Perl 61.7%
C 31.7%
Shell 2.6%
XS 2.3%
Text 0.8%
Other 0.7%