mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 16:39:36 +00:00
Prior to this patch, every time a code point was matched against a swash,
and the result was not previously known, a linear search through the
swash was performed. This patch changes that to generate an inversion
list whenever a swash for a binary property is created. A binary search
is then performed for missing values.
This change does not have much effect on the speed of Perl's regression
test suite, but the speed-up in worst-case scenarios is huge. The
program at the end of this commit is crafted to avoid the caching that
hides much of the current inefficiencies. At character classes of 100
isolated code points, the new method is about an order of magnitude
faster; two orders of magnitude at 1000 code points. The program at the
end of this commit message took 97s to execute on my box using blead,
and 1.5 seconds using this new scheme. I was surprised to see that even
with classes containing fewer than 10 code points, the binary search
trumped, by a little, the linear search
Even after this patch, under the current scheme, one can easily run out
of memory due to the permanent storing of results of swash lookups in
hashes. The new search mechanism might be fast enough to enable the
elimination of that memory usage. Instead, a simple cache in each
inversion list that stored its previous result could be created, and
that checked to see if it's still valid before starting the search,
under the assumption, which the current scheme also makes, that probes
will tend to be clustered together, as nearby code points are often in
the same script.
===============================================
# This program creates longer and longer character class lists while
# testing code points matches against them. By adding or subtracting
# 65 from the previous member, caching of results is eliminated (as of
# this writing), so this essentially tests for how long it takes to
# search through swashes to see if a code point matches or not.
use Benchmark ':hireswallclock';
my $string = "";
my $class_cp = 2**30; # Divide the code space in half, approx.
my $string_cp = $class_cp;
my $iterations = 10000;
for my $j (1..2048) {
# Append the next character to the [class]
my $hex_class_cp = sprintf("%X", $class_cp);
$string .= "\\x{$hex_class_cp}";
$class_cp -= 65;
next if $j % 100 != 0; # Only test certain ones
print "$j: lowest is [$hex_class_cp]: ";
timethis(1, "no warnings qw(portable non_unicode);my \$i = $string_cp; for (0 .. $iterations) { chr(\$i) =~ /[$string]/; \$i+= 65 }");
$string_cp += ($iterations + 1) * 65;
}
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Perl is Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 by Larry Wall and others. All rights reserved. ABOUT PERL ========== Perl is a general-purpose programming language originally developed for text manipulation and now used for a wide range of tasks including system administration, web development, network programming, GUI development, and more. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Its major features are that it's easy to use, supports both procedural and object-oriented (OO) programming, has powerful built-in support for text processing, and has one of the world's most impressive collections of third-party modules. For an introduction to the language's features, see pod/perlintro.pod. For a discussion of the important changes in this release, see pod/perldelta.pod. There are also many Perl books available, covering a wide variety of topics, from various publishers. See pod/perlbook.pod for more information. INSTALLATION ============ If you're using a relatively modern operating system and want to install this version of Perl locally, run the following commands: ./Configure -des -Dprefix=$HOME/localperl make test make install This will configure and compile perl for your platform, run the regression tests, and install perl in a subdirectory "localperl" of your home directory. If you run into any trouble whatsoever or you need to install a customized version of Perl, you should read the detailed instructions in the "INSTALL" file that came with this distribution. Additionally, there are a number of "README" files with hints and tips about building and using Perl on a wide variety of platforms, some more common than others. Once you have Perl installed, a wealth of documentation is available to you through the 'perldoc' tool. To get started, run this command: perldoc perl IF YOU RUN INTO TROUBLE ======================= Perl is a large and complex system that's used for everything from knitting to rocket science. If you run into trouble, it's quite likely that someone else has already solved the problem you're facing. Once you've exhausted the documentation, please report bugs to us using the 'perlbug' tool. For more information about perlbug, either type 'perldoc perlbug' or just 'perlbug' on a line by itself. While it was current when we made it available, Perl is constantly evolving and there may be a more recent version that fixes bugs you've run into or adds new features that you might find useful. You can always find the latest version of perl on a CPAN (Comprehensive Perl Archive Network) site near you at http://www.cpan.org/src/ Just a personal note: I want you to know that I create nice things like this because it pleases the Author of my story. If this bothers you, then your notion of Authorship needs some revision. But you can use perl anyway. :-) The author. LICENSING ========= This program is free software; you can redistribute it and/or modify it under the terms of either: a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or b) the "Artistic License" which comes with this Kit. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the Artistic License for more details. You should have received a copy of the Artistic License with this Kit, in the file named "Artistic". If not, I'll be glad to provide one. You should also have received a copy of the GNU General Public License along with this program in the file named "Copying". If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA or visit their web page on the internet at http://www.gnu.org/copyleft/gpl.html. For those of you that choose to use the GNU General Public License, my interpretation of the GNU General Public License is that no Perl script falls under the terms of the GPL unless you explicitly put said script under the terms of the GPL yourself. Furthermore, any object code linked with perl does not automatically fall under the terms of the GPL, provided such object code only adds definitions of subroutines and variables, and does not otherwise impair the resulting interpreter from executing any standard Perl script. I consider linking in C subroutines in this manner to be the moral equivalent of defining subroutines in the Perl language itself. You may sell such an object file as proprietary provided that you provide or offer to provide the Perl source, as specified by the GNU General Public License. (This is merely an alternate way of specifying input to the program.) You may also sell a binary produced by the dumping of a running Perl script that belongs to you, provided that you provide or offer to provide the Perl source as specified by the GPL. (The fact that a Perl interpreter and your code are in the same binary file is, in this case, a form of mere aggregation.) This is my interpretation of the GPL. If you still have concerns or difficulties understanding my intent, feel free to contact me. Of course, the Artistic License spells all this out for your protection, so you may prefer to use that.
Description
Languages
Perl
61.7%
C
31.7%
Shell
2.6%
XS
2.3%
Text
0.8%
Other
0.7%