Unicode::UCD.pm: Pod clarifications and nits

This commit is contained in:
Karl Williamson 2013-03-30 21:13:38 -06:00 committed by Karl Williamson
parent 5e784d588d
commit 53cb2385fc

View File

@ -5,7 +5,7 @@ use warnings;
no warnings 'surrogate'; # surrogates can be inputs to this
use charnames ();
our $VERSION = '0.57';
our $VERSION = '0.58';
require Exporter;
@ -244,7 +244,7 @@ of the bidi type name.
is empty if I<code> has no decomposition; or is one or more codes
(separated by spaces) that, taken in order, represent a decomposition for
I<code>. Each has at least four hexdigits.
The codes may be preceded by a word enclosed in angle brackets then a space,
The codes may be preceded by a word enclosed in angle brackets, then a space,
like C<E<lt>compatE<gt> >, giving the type of decomposition
This decomposition may be an intermediate one whose components are also
@ -252,7 +252,7 @@ decomposable. Use L<Unicode::Normalize> to get the final decomposition.
=item B<decimal>
if I<code> is a decimal digit this is its integer numeric value
if I<code> represents a decimal digit this is its integer numeric value
=item B<digit>
@ -599,7 +599,7 @@ sub charinrange {
my $range = charblock('Armenian');
With a L</code point argument> charblock() returns the I<block> the code point
With a L</code point argument> C<charblock()> returns the I<block> the code point
belongs to, e.g. C<Basic Latin>. The old-style block name is returned (see
L</Old-style versus new-style block names>).
If the code point is unassigned, this returns the block it would belong to if
@ -608,16 +608,20 @@ have blocks, all code points are considered to be in C<No_Block>.)
See also L</Blocks versus Scripts>.
If supplied with an argument that can't be a code point, charblock() tries to
If supplied with an argument that can't be a code point, C<charblock()> tries to
do the opposite and interpret the argument as an old-style block name. On an
ASCII platform, the return value is a I<range set> with one range: an
anonymous list with a single element that consists of another anonymous list
whose first element is the first code point in the block, and whose second
(and final) element is the final code point in the block. On an EBCDIC
element is the final code point in the block. On an EBCDIC
platform, the first two Unicode blocks are not contiguous. Their range sets
are lists containing I<start-of-range>, I<end-of-range> code point pairs. You
are lists containing I<start-of-range>, I<end-of-range> code point pairs. You
can test whether a code point is in a range set using the L</charinrange()>
function. If the argument is not a known block, C<undef> is returned.
function. (To be precise, each I<range set> contains a third array element,
after the range boundary ones: the old_style block name.)
If the argument to C<charblock()> is not a known block, C<undef> is
returned.
=cut
@ -708,8 +712,8 @@ sub charblock {
my $range = charscript('Thai');
With a L</code point argument> charscript() returns the I<script> the
code point belongs to, e.g. C<Latin>, C<Greek>, C<Han>.
With a L</code point argument>, C<charscript()> returns the I<script> the
code point belongs to, e.g., C<Latin>, C<Greek>, C<Han>.
If the code point is unassigned or the Unicode version being used is so early
that it doesn't have scripts, this function returns C<"Unknown">.
@ -717,8 +721,11 @@ If supplied with an argument that can't be a code point, charscript() tries
to do the opposite and interpret the argument as a script name. The
return value is a I<range set>: an anonymous list of lists that contain
I<start-of-range>, I<end-of-range> code point pairs. You can test whether a
code point is in a range set using the L</charinrange()> function. If the
argument is not a known script, C<undef> is returned.
code point is in a range set using the L</charinrange()> function.
(To be precise, each I<range set> contains a third array element,
after the range boundary ones: the script name.)
If the C<charscript()> argument is not a known script, C<undef> is returned.
See also L</Blocks versus Scripts>.
@ -767,7 +774,7 @@ sub charscript {
my $charblocks = charblocks();
charblocks() returns a reference to a hash with the known block names
C<charblocks()> returns a reference to a hash with the known block names
as the keys, and the code point ranges (see L</charblock()>) as the values.
The names are in the old-style (see L</Old-style versus new-style block
@ -791,7 +798,7 @@ sub charblocks {
my $charscripts = charscripts();
charscripts() returns a reference to a hash with the known script
C<charscripts()> returns a reference to a hash with the known script
names as the keys, and the code point ranges (see L</charscript()>) as
the values.
@ -812,7 +819,7 @@ sub charscripts {
In addition to using the C<\p{Blk=...}> and C<\P{Blk=...}> constructs, you
can also test whether a code point is in the I<range> as returned by
L</charblock()> and L</charscript()> or as the values of the hash returned
by L</charblocks()> and L</charscripts()> by using charinrange():
by L</charblocks()> and L</charscripts()> by using C<charinrange()>:
use Unicode::UCD qw(charscript charinrange);
@ -942,7 +949,9 @@ sub bidi_types {
my $compexcl = compexcl(0x09dc);
This routine returns C<undef> if the Unicode version being used is so early
that it doesn't have this property. It is included for backwards
that it doesn't have this property.
C<compexcl()> is included for backwards
compatibility, but as of Perl 5.12 and more modern Unicode versions, for
most purposes it is probably more convenient to use one of the following
instead:
@ -1462,10 +1471,11 @@ sub casespec {
If used with a single argument in a scalar context, returns the string
consisting of the code points of the named sequence, or C<undef> if no
named sequence by that name exists. If used with a single argument in
a list context, it returns the list of the ordinals of the code points. If used
with no
arguments in a list context, returns a hash with the names of the
named sequences as the keys and the named sequences as strings as
a list context, it returns the list of the ordinals of the code points.
If used with no
arguments in a list context, it returns a hash with the names of all the
named sequences as the keys and their sequences as strings as
the values. Otherwise, it returns C<undef> or an empty list depending
on the context.
@ -1581,7 +1591,7 @@ sub _numeric {
my $val = num("123");
my $one_quarter = num("\N{VULGAR FRACTION 1/4}");
C<num> returns the numeric value of the input Unicode string; or C<undef> if it
C<num()> returns the numeric value of the input Unicode string; or C<undef> if it
doesn't think the entire string has a completely valid, safe numeric value.
If the string is just one character in length, the Unicode numeric value