mirror of
https://github.com/Perl/perl5.git
synced 2026-01-27 01:44:43 +00:00
Unicode::UCD.pm: Pod clarifications and nits
This commit is contained in:
parent
5e784d588d
commit
53cb2385fc
@ -5,7 +5,7 @@ use warnings;
|
||||
no warnings 'surrogate'; # surrogates can be inputs to this
|
||||
use charnames ();
|
||||
|
||||
our $VERSION = '0.57';
|
||||
our $VERSION = '0.58';
|
||||
|
||||
require Exporter;
|
||||
|
||||
@ -244,7 +244,7 @@ of the bidi type name.
|
||||
is empty if I<code> has no decomposition; or is one or more codes
|
||||
(separated by spaces) that, taken in order, represent a decomposition for
|
||||
I<code>. Each has at least four hexdigits.
|
||||
The codes may be preceded by a word enclosed in angle brackets then a space,
|
||||
The codes may be preceded by a word enclosed in angle brackets, then a space,
|
||||
like C<E<lt>compatE<gt> >, giving the type of decomposition
|
||||
|
||||
This decomposition may be an intermediate one whose components are also
|
||||
@ -252,7 +252,7 @@ decomposable. Use L<Unicode::Normalize> to get the final decomposition.
|
||||
|
||||
=item B<decimal>
|
||||
|
||||
if I<code> is a decimal digit this is its integer numeric value
|
||||
if I<code> represents a decimal digit this is its integer numeric value
|
||||
|
||||
=item B<digit>
|
||||
|
||||
@ -599,7 +599,7 @@ sub charinrange {
|
||||
|
||||
my $range = charblock('Armenian');
|
||||
|
||||
With a L</code point argument> charblock() returns the I<block> the code point
|
||||
With a L</code point argument> C<charblock()> returns the I<block> the code point
|
||||
belongs to, e.g. C<Basic Latin>. The old-style block name is returned (see
|
||||
L</Old-style versus new-style block names>).
|
||||
If the code point is unassigned, this returns the block it would belong to if
|
||||
@ -608,16 +608,20 @@ have blocks, all code points are considered to be in C<No_Block>.)
|
||||
|
||||
See also L</Blocks versus Scripts>.
|
||||
|
||||
If supplied with an argument that can't be a code point, charblock() tries to
|
||||
If supplied with an argument that can't be a code point, C<charblock()> tries to
|
||||
do the opposite and interpret the argument as an old-style block name. On an
|
||||
ASCII platform, the return value is a I<range set> with one range: an
|
||||
anonymous list with a single element that consists of another anonymous list
|
||||
whose first element is the first code point in the block, and whose second
|
||||
(and final) element is the final code point in the block. On an EBCDIC
|
||||
element is the final code point in the block. On an EBCDIC
|
||||
platform, the first two Unicode blocks are not contiguous. Their range sets
|
||||
are lists containing I<start-of-range>, I<end-of-range> code point pairs. You
|
||||
are lists containing I<start-of-range>, I<end-of-range> code point pairs. You
|
||||
can test whether a code point is in a range set using the L</charinrange()>
|
||||
function. If the argument is not a known block, C<undef> is returned.
|
||||
function. (To be precise, each I<range set> contains a third array element,
|
||||
after the range boundary ones: the old_style block name.)
|
||||
|
||||
If the argument to C<charblock()> is not a known block, C<undef> is
|
||||
returned.
|
||||
|
||||
=cut
|
||||
|
||||
@ -708,8 +712,8 @@ sub charblock {
|
||||
|
||||
my $range = charscript('Thai');
|
||||
|
||||
With a L</code point argument> charscript() returns the I<script> the
|
||||
code point belongs to, e.g. C<Latin>, C<Greek>, C<Han>.
|
||||
With a L</code point argument>, C<charscript()> returns the I<script> the
|
||||
code point belongs to, e.g., C<Latin>, C<Greek>, C<Han>.
|
||||
If the code point is unassigned or the Unicode version being used is so early
|
||||
that it doesn't have scripts, this function returns C<"Unknown">.
|
||||
|
||||
@ -717,8 +721,11 @@ If supplied with an argument that can't be a code point, charscript() tries
|
||||
to do the opposite and interpret the argument as a script name. The
|
||||
return value is a I<range set>: an anonymous list of lists that contain
|
||||
I<start-of-range>, I<end-of-range> code point pairs. You can test whether a
|
||||
code point is in a range set using the L</charinrange()> function. If the
|
||||
argument is not a known script, C<undef> is returned.
|
||||
code point is in a range set using the L</charinrange()> function.
|
||||
(To be precise, each I<range set> contains a third array element,
|
||||
after the range boundary ones: the script name.)
|
||||
|
||||
If the C<charscript()> argument is not a known script, C<undef> is returned.
|
||||
|
||||
See also L</Blocks versus Scripts>.
|
||||
|
||||
@ -767,7 +774,7 @@ sub charscript {
|
||||
|
||||
my $charblocks = charblocks();
|
||||
|
||||
charblocks() returns a reference to a hash with the known block names
|
||||
C<charblocks()> returns a reference to a hash with the known block names
|
||||
as the keys, and the code point ranges (see L</charblock()>) as the values.
|
||||
|
||||
The names are in the old-style (see L</Old-style versus new-style block
|
||||
@ -791,7 +798,7 @@ sub charblocks {
|
||||
|
||||
my $charscripts = charscripts();
|
||||
|
||||
charscripts() returns a reference to a hash with the known script
|
||||
C<charscripts()> returns a reference to a hash with the known script
|
||||
names as the keys, and the code point ranges (see L</charscript()>) as
|
||||
the values.
|
||||
|
||||
@ -812,7 +819,7 @@ sub charscripts {
|
||||
In addition to using the C<\p{Blk=...}> and C<\P{Blk=...}> constructs, you
|
||||
can also test whether a code point is in the I<range> as returned by
|
||||
L</charblock()> and L</charscript()> or as the values of the hash returned
|
||||
by L</charblocks()> and L</charscripts()> by using charinrange():
|
||||
by L</charblocks()> and L</charscripts()> by using C<charinrange()>:
|
||||
|
||||
use Unicode::UCD qw(charscript charinrange);
|
||||
|
||||
@ -942,7 +949,9 @@ sub bidi_types {
|
||||
my $compexcl = compexcl(0x09dc);
|
||||
|
||||
This routine returns C<undef> if the Unicode version being used is so early
|
||||
that it doesn't have this property. It is included for backwards
|
||||
that it doesn't have this property.
|
||||
|
||||
C<compexcl()> is included for backwards
|
||||
compatibility, but as of Perl 5.12 and more modern Unicode versions, for
|
||||
most purposes it is probably more convenient to use one of the following
|
||||
instead:
|
||||
@ -1462,10 +1471,11 @@ sub casespec {
|
||||
If used with a single argument in a scalar context, returns the string
|
||||
consisting of the code points of the named sequence, or C<undef> if no
|
||||
named sequence by that name exists. If used with a single argument in
|
||||
a list context, it returns the list of the ordinals of the code points. If used
|
||||
with no
|
||||
arguments in a list context, returns a hash with the names of the
|
||||
named sequences as the keys and the named sequences as strings as
|
||||
a list context, it returns the list of the ordinals of the code points.
|
||||
|
||||
If used with no
|
||||
arguments in a list context, it returns a hash with the names of all the
|
||||
named sequences as the keys and their sequences as strings as
|
||||
the values. Otherwise, it returns C<undef> or an empty list depending
|
||||
on the context.
|
||||
|
||||
@ -1581,7 +1591,7 @@ sub _numeric {
|
||||
my $val = num("123");
|
||||
my $one_quarter = num("\N{VULGAR FRACTION 1/4}");
|
||||
|
||||
C<num> returns the numeric value of the input Unicode string; or C<undef> if it
|
||||
C<num()> returns the numeric value of the input Unicode string; or C<undef> if it
|
||||
doesn't think the entire string has a completely valid, safe numeric value.
|
||||
|
||||
If the string is just one character in length, the Unicode numeric value
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user