mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 08:38:23 +00:00
Document uv_to_utf8_family
This commit is contained in:
parent
b31bc26c15
commit
9c2eed89a7
@ -414,6 +414,11 @@ L<perlapi/C<utf8_to_uv>> replaces L<perlapi/C<utf8_to_uvchr>> (which is
|
||||
retained for backwards compatibility), but you should convert to use the
|
||||
new form, as likely you aren't using the old one safely.
|
||||
|
||||
To convert in the opposite direction, you can now use
|
||||
L<perlapi/C<uv_to_utf8>>. This is not a new function, but a new synonym
|
||||
for L<perlapi/C<uvchr_to_utf8>>. It is added so you don't have to learn
|
||||
two sets of names.
|
||||
|
||||
There are also two new functions, L<perlapi/C<strict_utf8_to_uv>> and
|
||||
L<perlapi/C<c9strict_utf8_to_uv>> which do the same thing except when
|
||||
the input string represents a code point that Unicode doesn't accept as
|
||||
@ -440,6 +445,12 @@ L<perlapi/C<utf8_to_uv_errors>> replaces L<perlapi/C<utf8n_to_uvchr_error>>.
|
||||
L<perlapi/C<utf8_to_uv_msgs>> replaces
|
||||
L<perlapi/C<utf8n_to_uvchr_msgs>>.
|
||||
|
||||
Also added are the inverse functions L<perlapi/C<uv_to_utf8_flags>>
|
||||
and L<perlapi/C<uv_to_utf8_msgs>>, which are synonyms for the existing
|
||||
functions, L<perlapi/C<uvchr_to_utf8_flags>> and
|
||||
L<perlapi/C<uvchr_to_utf8_flags_msgs>> respectively. These are provided only
|
||||
so you don't have to learn two sets of names.
|
||||
|
||||
=item *
|
||||
|
||||
Three new API functions are introduced to convert strings encoded in
|
||||
|
||||
26
utf8.c
26
utf8.c
@ -121,14 +121,14 @@ S_new_msg_hv(pTHX_ const char * const message, /* The message text */
|
||||
=for apidoc uvoffuni_to_utf8_flags
|
||||
|
||||
THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
|
||||
Instead, B<Almost all code should use L<perlapi/uvchr_to_utf8> or
|
||||
L<perlapi/uvchr_to_utf8_flags>>.
|
||||
Instead, B<Almost all code should use L<perlapi/uv_to_utf8> or
|
||||
L<perlapi/uv_to_utf8_flags>>.
|
||||
|
||||
This function is like them, but the input is a strict Unicode
|
||||
(as opposed to native) code point. Only in very rare circumstances should code
|
||||
not be using the native code point.
|
||||
|
||||
For details, see the description for L<perlapi/uvchr_to_utf8_flags>.
|
||||
For details, see the description for L<perlapi/uv_to_utf8_flags>.
|
||||
|
||||
=cut
|
||||
*/
|
||||
@ -155,9 +155,11 @@ const char super_cp_format[] = "Code point 0x%" UVXf " is not Unicode,"
|
||||
#define MASK UTF_CONTINUATION_MASK
|
||||
|
||||
/*
|
||||
=for apidoc uvchr_to_utf8_flags_msgs
|
||||
=for apidoc uv_to_utf8_msgs
|
||||
=for apidoc_item uvchr_to_utf8_flags_msgs
|
||||
|
||||
THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
|
||||
These functions are identical. THEY SHOULD BE USED IN ONLY VERY SPECIALIZED
|
||||
CIRCUMSTANCES.
|
||||
|
||||
Most code should use C<L</uvchr_to_utf8_flags>()> rather than call this directly.
|
||||
|
||||
@ -367,7 +369,9 @@ Perl_uvoffuni_to_utf8_flags_msgs(pTHX_ U8 *d, UV input_uv, UV flags, HV** msgs)
|
||||
}
|
||||
|
||||
/*
|
||||
=for apidoc uvchr_to_utf8
|
||||
=for apidoc uv_to_utf8
|
||||
=for apidoc_item uv_to_utf8_flags
|
||||
=for apidoc_item uvchr_to_utf8
|
||||
=for apidoc_item uvchr_to_utf8_flags
|
||||
|
||||
These each add the UTF-8 representation of the native code point C<uv> to the
|
||||
@ -375,18 +379,22 @@ end of the string C<d>; C<d> should have at least C<UVCHR_SKIP(uv)+1> (up to
|
||||
C<UTF8_MAXBYTES+1>) free bytes available. The return value is the pointer to
|
||||
the byte after the end of the new character. In other words,
|
||||
|
||||
d = uvchr_to_utf8(d, uv);
|
||||
d = uv_to_utf8(d, uv);
|
||||
|
||||
This is the Unicode-aware way of saying
|
||||
|
||||
*(d++) = uv;
|
||||
|
||||
C<flags> is used to make some classes of code points problematic in some way.
|
||||
C<uvchr_to_utf8> is effectively the same as calling C<uvchr_to_utf8_flags>
|
||||
(C<uvchr_to_utf8> is a synonym for C<uv_to_utf8>.)
|
||||
|
||||
C<uv_to_utf8_flags> is used to make some classes of code points problematic in
|
||||
some way. C<uv_to_utf8> is effectively the same as calling C<uv_to_utf8_flags>
|
||||
with C<flags> set to 0, meaning no class of code point is considered
|
||||
problematic. That means any input code point from 0..C<IV_MAX> is considered
|
||||
to be fine. C<IV_MAX> is typically 0x7FFF_FFFF in a 32-bit word.
|
||||
|
||||
(C<uvchr_to_utf8_flags> is a synonym for C<uv_to_utf8_flags>).
|
||||
|
||||
A code point can be problematic in one of two ways. Its use could just raise a
|
||||
warning, and/or it could be forbidden with the function failing, and returning
|
||||
NULL.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user