mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 16:39:36 +00:00
... by renaming it to charclass_invlists.inc, which no longer matches the standard *.h pattern. charclass_invlists.inc is a large (4.5MB) generated file that is only needed at build time (specifically, by utf8.c as well as regcomp.c and regexec.c). Skipping this single file makes a standard perl install slightly smaller (from ~90MB to ~85MB in my tests).
175 lines
7.8 KiB
Perl
175 lines
7.8 KiB
Perl
# The goal is for perl to compile and reasonably run any version of Unicode.
|
|
# Working reasonably well doesn't mean that the test suite will run without
|
|
# showing errors. A few of the very-Unicode specific test files have been
|
|
# modified to account for different versions, but most have not. For example,
|
|
# some tests use characters that aren't encoded in all Unicode versions; others
|
|
# have hard-coded the General Categories for a code point that were correct at
|
|
# the time the test was written. Perl itself will not compile under Unicode
|
|
# releases prior to 3.0 without a simple change to Unicode::Normalize.
|
|
# mktables contains instructions for this.
|
|
|
|
# The *.txt files were copied from
|
|
|
|
# ftp://www.unicode.org/Public/UNIDATA
|
|
|
|
# (which always points to the latest version) with subdirectories 'extracted' and
|
|
# 'auxiliary'. Older versions are located under Public with an appropriate name.
|
|
# They are also available via http at www.unicode.org/versions/
|
|
#
|
|
|
|
# The Unihan files were not included due to space considerations. Also NOT
|
|
# included were any *.html files. It is possible to add the Unihan files and
|
|
# have some properties from them automatically compiled. By editing mktables
|
|
# (see instructions near its beginning) you can add other Unihan properties.
|
|
|
|
# The file named 'version' should exist and be a single line with the Unicode
|
|
# version, like:
|
|
#
|
|
# 5.2.0
|
|
#
|
|
# (without the initial '# ')
|
|
|
|
# To be 8.3 filesystem friendly, the names of some of the input files have been
|
|
# changed from the values that are in the Unicode DB. Not all of the Test
|
|
# files are currently used, so may not be present, so some of the mv's can
|
|
# fail. The .html Test files are not touched.
|
|
|
|
mv PropertyValueAliases.txt PropValueAliases.txt
|
|
mv NamedSequencesProv.txt NamedSqProv.txt
|
|
mv NormalizationTest.txt NormTest.txt
|
|
mv DerivedAge.txt DAge.txt
|
|
mv DerivedCoreProperties.txt DCoreProperties.txt
|
|
mv DerivedNormalizationProps.txt DNormalizationProps.txt
|
|
mv IdentifierStatus.txt IdStatus.txt
|
|
mv IdentifierType.txt IdType.txt
|
|
|
|
# Some early releases don't have the extracted directory, and hence these files
|
|
# should be moved to it.
|
|
mkdir extracted 2>/dev/null
|
|
mv DerivedBidiClass.txt DerivedBinaryProperties.txt extracted 2>/dev/null
|
|
mv DerivedCombiningClass.txt DerivedDecompositionType.txt extracted 2>/dev/null
|
|
mv DerivedEastAsianWidth.txt DerivedGeneralCategory.txt extracted 2>/dev/null
|
|
mv DerivedJoiningGroup.txt DerivedJoiningType.txt extracted 2>/dev/null
|
|
mv DerivedLineBreak.txt DerivedNumericType.txt DerivedNumericValues.txt extracted 2>/dev/null
|
|
|
|
mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt
|
|
mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt
|
|
mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt
|
|
mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt
|
|
mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt
|
|
mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt
|
|
mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt
|
|
mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt
|
|
mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt
|
|
mv extracted/DerivedNumericType.txt extracted/DNumType.txt
|
|
mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt
|
|
mv extracted/DerivedName.txt extracted/DName.txt
|
|
rmdir extracted 2>/dev/null # Will fail if non-empty, but if it is empty
|
|
# was an early release that didn't have it.
|
|
|
|
mv auxiliary/GraphemeBreakTest.txt auxiliary/GCBTest.txt
|
|
mv auxiliary/LineBreakTest.txt auxiliary/LBTest.txt
|
|
mv auxiliary/SentenceBreakTest.txt auxiliary/SBTest.txt
|
|
mv auxiliary/WordBreakTest.txt auxiliary/WBTest.txt
|
|
|
|
# If you have the Unihan database (5.2 and above), you should also do the
|
|
# following:
|
|
|
|
mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt
|
|
mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt
|
|
mv Unihan_IRGSources.txt UnihanIRGSources.txt
|
|
mv Unihan_NumericValues.txt UnihanNumericValues.txt
|
|
mv Unihan_OtherMappings.txt UnihanOtherMappings.txt
|
|
mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt
|
|
mv Unihan_Readings.txt UnihanReadings.txt
|
|
mv Unihan_Variants.txt UnihanVariants.txt
|
|
|
|
# If you download everything, the names of files that are not used by mktables
|
|
# are not changed by the above, and hence may not work correctly as-is on 8.3
|
|
# filesystems.
|
|
|
|
# mktables is used to generate the tables used by the rest of Perl. It will
|
|
# warn you about any *.txt and *.html files in the directory substructure that
|
|
# it doesn't know about. You should remove any so-identified, or edit mktables
|
|
# to add them to its lists to process. You can run
|
|
#
|
|
# mktables -globlist
|
|
#
|
|
# to have it try to process these tables generically.
|
|
|
|
# COMPILING ON OLDER UNICODE VERSIONS
|
|
#
|
|
# To compile perl for use with an older Unicode release, delete everything in
|
|
# the lib/unicore directory except mktables and Makefile. Then download the
|
|
# Unicode-supplied files for the desired version to that directory (A url for
|
|
# these is given earlier in this file). Then create the 'version' file with a
|
|
# single line, like '6.1.0'. Do a 'make test' from the project level. You
|
|
# will get some porting errors for needing to regen. Regenerate what it tells
|
|
# you are needed, and make test again. If you compile an old enough version,
|
|
# you will also have to download a few files from later Unicode versions,
|
|
# following the instructions that will be given if warranted. It should
|
|
# compile in any release without warnings, except for some casing conflicts
|
|
# in Unicode 2.1.8, and some extraneous files will show up in very early
|
|
# releases of the form qr/diff.*\.txt/. If you add Unihan.txt, one line is in error in
|
|
#
|
|
# Other glitches are noted in mktables under 'UNICODE VERSIONS NOTES'
|
|
|
|
# FOR PUMPKINS
|
|
#
|
|
# The files are inter-related. If you take the latest UnicodeData.txt, for
|
|
# example, but leave the older versions of other files, there can be subtle
|
|
# problems. So get everything available from Unicode, and delete those which
|
|
# aren't needed.
|
|
#
|
|
# When moving to a new version of Unicode, you need to update 'version' by hand
|
|
#
|
|
# p4 edit version
|
|
# ...
|
|
#
|
|
# You should look in the Unicode release notes (which are probably towards the
|
|
# bottom of http://www.unicode.org/reports/tr44/) to see if any properties have
|
|
# newly been moved to be Obsolete, Deprecated, or Stabilized. The full names
|
|
# for these should be added to the respective lists near the beginning of
|
|
# mktables, using an 'if' to add them for just this Unicode version going
|
|
# forward, so that mktables can continue to be used for earlier Unicode
|
|
# versions.
|
|
#
|
|
# When putting out a new Perl release, think about if any of the Deprecated
|
|
# properties should be moved to Suppressed.
|
|
#
|
|
# perlrecharclass.pod has a list of all the characters that are white space,
|
|
# which needs to be updated if there are changes. A quick way to check if
|
|
# there have been changes would be to see if the number of such characters
|
|
# listed in perluniprops.pod (generated by running mktables) for the property
|
|
# \p{White_Space} is no longer 25. Further investigation would then be
|
|
# necessary to classify the new characters as horizontal and vertical.
|
|
#
|
|
# The code in regexec.c for the \X match construct is intimately tied to the
|
|
# regular expression in UAX #29 (http://www.unicode.org/reports/tr29/). You
|
|
# should see if it has changed, and if so, regexec.c should be modified. The
|
|
# current one is
|
|
# ( CRLF
|
|
# | Prepend* ( RI-sequence | Hangul-Syllable | !Control )
|
|
# ( Grapheme_Extend | SpacingMark )*
|
|
# | . )
|
|
#
|
|
# mktables has many checks to warn you if there are unexpected or novel things
|
|
# that it doesn't know how to handle.
|
|
#
|
|
# Module::CoreList should be changed to include the new release
|
|
#
|
|
# Also, you should regen l1_char_class_tab.h, by
|
|
#
|
|
# perl regen/mk_L_charclass.pl
|
|
#
|
|
# and, regen charclass_invlists.inc by
|
|
#
|
|
# perl regen/mk_invlists.pl
|
|
#
|
|
# Finally:
|
|
#
|
|
# p4 submit
|
|
#
|
|
# --
|
|
# jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com
|