From 670935c948929d2d69ccc2f705347bc792e01e90 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Wed, 22 May 2024 13:03:33 -0600 Subject: [PATCH] Add extensive guidance to perlclib This consolidates much of the pod about interfacing with the standard C library into this pod, while adding extensive documentation. --- MANIFEST | 2 +- dist/ExtUtils-ParseXS/lib/perlxs.pod | 224 +---- dist/ExtUtils-ParseXS/lib/perlxstut.pod | 5 +- pod/perl.pod | 2 +- pod/perlclib.pod | 1168 ++++++++++++++++++++--- pod/perldelta.pod | 12 + pod/perlembed.pod | 6 +- pod/perlguts.pod | 2 + pod/perllocale.pod | 7 +- pod/perlthrtut.pod | 25 +- t/porting/known_pod_issues.dat | 4 +- 11 files changed, 1098 insertions(+), 359 deletions(-) diff --git a/MANIFEST b/MANIFEST index cff5c4cf65..5430a3d885 100644 --- a/MANIFEST +++ b/MANIFEST @@ -5738,7 +5738,7 @@ pod/perlcall.pod Perl calling conventions from C pod/perlcheat.pod Perl cheat sheet pod/perlclass.pod Perl class syntax pod/perlclassguts.pod Internals of class syntax -pod/perlclib.pod Internal replacements for standard C library functions +pod/perlclib.pod Interacting with standard C library functions pod/perlcommunity.pod Perl community information pod/perldata.pod Perl data structures pod/perldbmfilter.pod Perl DBM filters diff --git a/dist/ExtUtils-ParseXS/lib/perlxs.pod b/dist/ExtUtils-ParseXS/lib/perlxs.pod index 40146ac0e1..06ec37da45 100644 --- a/dist/ExtUtils-ParseXS/lib/perlxs.pod +++ b/dist/ExtUtils-ParseXS/lib/perlxs.pod @@ -14,7 +14,13 @@ or statically linked into perl. The XS interface description is written in the XS language and is the core component of the Perl extension interface. -Before writing XS, read the L section below. +This documents the XS language, but it's important to first note that XS +code has full access to system calls including C library functions. It +thus has the capability of interfering with things that the Perl core or +other modules have set up, such as signal handlers or file handles. It +could mess with the memory, or any number of harmful things. Don't. +Further detail is in L, which you should read before actually +writing any production XS. An B forms the basic unit of the XS interface. After compilation by the B compiler, each XSUB amounts to a C function definition @@ -2110,30 +2116,6 @@ Note that these macros will only work together within the I source file; that is, a dMY_CTX in one source file will access a different structure than a dMY_CTX in another source file. -=head2 Thread-aware system interfaces - -Starting from Perl 5.8, in C/C++ level Perl knows how to wrap -system/library interfaces that have thread-aware versions -(e.g. getpwent_r()) into frontend macros (e.g. getpwent()) that -correctly handle the multithreaded interaction with the Perl -interpreter. This will happen transparently, the only thing -you need to do is to instantiate a Perl interpreter. - -This wrapping happens always when compiling Perl core source -(PERL_CORE is defined) or the Perl core extensions (PERL_EXT is -defined). When compiling XS code outside of the Perl core, the wrapping -does not take place before Perl 5.28. Starting in that release you can - - #define PERL_REENTRANT - -in your code to enable the wrapping. It is advisable to do so if you -are using such functions, as intermixing the C<_r>-forms (as Perl compiled -for multithreaded operation will do) and the C<_r>-less forms is neither -well-defined (inconsistent results, data corruption, or even crashes -become more likely), nor is it very portable. Unfortunately, not all -systems have all the C<_r> forms, but using this C<#define> gives you -whatever protection that Perl is aware is available on each system. - =head1 EXAMPLES File C: Interface to some ONC+ RPC bind library functions. @@ -2208,10 +2190,11 @@ In Makefile.PL add -ltirpc and -I/usr/include/tirpc. =head1 CAVEATS -XS code has full access to system calls including C library functions. -It thus has the capability of interfering with things that the Perl core -or other modules have set up, such as signal handlers or file handles. -It could mess with the memory, or any number of harmful things. Don't. +=head2 Use of standard C library functions + +See L. + +=head2 Event loops and control flow Some modules have an event loop, waiting for user-input. It is highly unlikely that two such modules would work adequately together in a @@ -2223,189 +2206,6 @@ help-mate, to accomplish things that perl doesn't do, or doesn't do fast enough, but always subservient to perl. The closer XS code adheres to this model, the less likely conflicts will occur. -One area where there has been conflict is in regards to C locales. (See -L.) perl, with one exception and unless told otherwise, -sets up the underlying locale the program is running in to the locale -passed -into it from the environment. This is an important difference from a -generic C language program, where the underlying locale is the "C" -locale unless the program changes it. As of v5.20, this underlying -locale is completely hidden from pure Perl code outside the lexical -scope of C> except for a couple of function calls in the -POSIX module which of necessity use it. But the underlying locale, with -that -one exception is exposed to XS code, affecting all C library routines -whose behavior is locale-dependent. Your XS code better not assume that -the underlying locale is "C". The exception is the -L|perllocale/Category LC_NUMERIC: Numeric Formatting> -locale category, and the reason it is an exception is that experience -has shown that it can be problematic for XS code, whereas we have not -had reports of problems with the -L. And the reason -for this one category being problematic is that the character used as a -decimal point can vary. Many European languages use a comma, whereas -English, and hence Perl are expecting a dot (U+002E: FULL STOP). Many -modules can handle only the radix character being a dot, and so perl -attempts to make it so. Up through Perl v5.20, the attempt was merely -to set C upon startup to the C<"C"> locale. Any -L otherwise would change -it; this caused some failures. Therefore, starting in v5.22, perl tries -to keep C always set to C<"C"> for XS code. - -To summarize, here's what to expect and how to handle locales in XS code: - -=over - -=item Non-locale-aware XS code - -Keep in mind that even if you think your code is not locale-aware, it -may call a library function that is. Hopefully the man page for such -a function will indicate that dependency, but the documentation is -imperfect. - -The current locale is exposed to XS code except possibly C -(explained in the next paragraph). -There have not been reports of problems with the other categories. -Perl initializes things on start-up so that the current locale is the -one which is indicated by the user's environment in effect at that time. -See L. - -However, up through v5.20, Perl initialized things on start-up so that -C was set to the "C" locale. But if any code anywhere -changed it, it would stay changed. This means that your module can't -count on C being something in particular, and you can't -expect floating point numbers (including version strings) to have dots -in them. If you don't allow for a non-dot, your code could break if -anyone anywhere changed the locale. For this reason, v5.22 changed -the behavior so that Perl tries to keep C in the "C" locale -except around the operations internally where it should be something -else. Misbehaving XS code will always be able to change the locale -anyway, but the most common instance of this is checked for and -handled. - -=item Locale-aware XS code - -If the locale from the user's environment is desired, there should be no -need for XS code to set the locale except for C, as perl has -already set the others up. XS code should avoid changing the locale, as -it can adversely affect other, unrelated, code and may not be -thread-safe. To minimize problems, the macros -L, -L, and -L should be used to affect any needed -change. - -But, starting with Perl v5.28, locales are thread-safe on platforms that -support this functionality. Windows has this starting with Visual -Studio 2005. Many other modern platforms support the thread-safe POSIX -2008 functions. The C C<#define> C will be -defined iff this build is using these. From Perl-space, the read-only -variable C<${SAFE_LOCALES}> is 1 if either the build is not threaded, or -if C is defined; otherwise it is 0. - -The way this works under-the-hood is that every thread has a choice of -using a locale specific to it (this is the Windows and POSIX 2008 -functionality), or the global locale that is accessible to all threads -(this is the functionality that has always been there). The -implementations for Windows and POSIX are completely different. On -Windows, the runtime can be set up so that the standard -L> function either only knows about the global locale or -the locale for this thread. On POSIX, C always deals with -the global locale, and other functions have been created to handle -per-thread locales. Perl makes this transparent to perl-space code. It -continues to use C, and the interpreter translates -that into the per-thread functions. - -All other locale-sensitive functions automatically use the per-thread -locale, if that is turned on, and failing that, the global locale. Thus -calls to C are ineffective on POSIX systems for the current -thread if that thread is using a per-thread locale. If perl is compiled -for single-thread operation, it does not use the per-thread functions, -so C does work as expected. - -If you have loaded the L> module you can use the methods given -in L to call L|POSIX/setlocale> to safely -change or query the locale (on systems where it is safe to do so), or -you can use the new 5.28 function L instead, -which is a drop-in replacement for the system L>, and -handles single-threaded and multi-threaded applications transparently. - -There are some locale-related library calls that still aren't -thread-safe because they return data in a buffer global to all threads. -In the past, these didn't matter as locales weren't thread-safe at all. -But now you have to be aware of them in case your module is called in a -multi-threaded application. The known ones are - - asctime() - ctime() - gcvt() [POSIX.1-2001 only (function removed in POSIX.1-2008)] - getdate() - wcrtomb() if its final argument is NULL - wcsrtombs() if its final argument is NULL - wcstombs() - wctomb() - -Some of these shouldn't really be called in a Perl application, and for -others there are thread-safe versions of these already implemented: - - asctime_r() - ctime_r() - Perl_langinfo() - -The C<_r> forms are automatically used, starting in Perl 5.28, if you -compile your code, with - - #define PERL_REENTRANT - -See also L. -You can use the methods given in L, to get the best available -locale-safe versions of these - - POSIX::localeconv() - POSIX::wcstombs() - POSIX::wctomb() - -And note, that some items returned by C are available -through L. - -The others shouldn't be used in a threaded application. - -Some modules may call a non-perl library that is locale-aware. This is -fine as long as it doesn't try to query or change the locale using the -system C. But if these do call the system C, -those calls may be ineffective. Instead, -L|perlapi/Perl_setlocale> works in all circumstances. -Plain setlocale is ineffective on multi-threaded POSIX 2008 systems. It -operates only on the global locale, whereas each thread has its own -locale, paying no attention to the global one. Since converting -these non-Perl libraries to C is out of the question, -there is a new function in v5.28 -L|perlapi/switch_to_global_locale> that will -switch the thread it is called from so that any system C -calls will have their desired effect. The function -L|perlapi/sync_locale> must be called before returning to -perl. - -This thread can change the locale all it wants and it won't affect any -other thread, except any that also have been switched to the global -locale. This means that a multi-threaded application can have a single -thread using an alien library without a problem; but no more than a -single thread can be so-occupied. Bad results likely will happen. - -In perls without multi-thread locale support, some alien libraries, -such as C change locales. This can cause problems for the Perl -core and other modules. For these, before control is returned to -perl, starting in v5.20.1, calling the function -L from XS should be sufficient to -avoid most of these problems. Prior to this, you need a pure Perl -statement that does this: - - POSIX::setlocale(LC_ALL, POSIX::setlocale(LC_ALL)); - -or use the methods given in L. - -=back - =head1 XS VERSION This document covers features supported by C diff --git a/dist/ExtUtils-ParseXS/lib/perlxstut.pod b/dist/ExtUtils-ParseXS/lib/perlxstut.pod index fcafa58a81..c3ad1c6ad3 100644 --- a/dist/ExtUtils-ParseXS/lib/perlxstut.pod +++ b/dist/ExtUtils-ParseXS/lib/perlxstut.pod @@ -6,7 +6,7 @@ perlxstut - Tutorial for writing XSUBs This tutorial will educate the reader on the steps involved in creating a Perl extension. The reader is assumed to have access to L, -L and L. +L, L, and L. This tutorial starts with very simple examples and becomes more complex, with each new example adding new features. Certain concepts may not be @@ -1403,7 +1403,8 @@ Some systems may have installed Perl version 5 as "perl5". =head1 See also -For more information, consult L, L, L, L, +For more information, consult L, L, L, +L, L, L, and L =head1 Author diff --git a/pod/perl.pod b/pod/perl.pod index c7555fe596..506d8f7ac2 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -155,7 +155,7 @@ aux h2ph h2xs perlbug pl2pm pod2html pod2man splain xsubpp perlxstut Perl XS tutorial perlxs Perl XS application programming interface perlxstypemap Perl XS C/Perl type conversion tools - perlclib Internal replacements for standard C library functions + perlclib Interacting with standard C library functions perlguts Perl internal functions for those doing extensions perlcall Perl calling conventions from C perlmroapi Perl method resolution plugin interface diff --git a/pod/perlclib.pod b/pod/perlclib.pod index aed8386d1d..ea8f7d86d6 100644 --- a/pod/perlclib.pod +++ b/pod/perlclib.pod @@ -1,18 +1,74 @@ =head1 NAME -perlclib - Internal replacements for standard C library functions +perlclib - Interacting with standard C library functions =head1 DESCRIPTION +The perl interpreter is written in C; XS code also expands to C. +Inevitably, this code will call some functions from the C library, +C. This document gives some guidance on interfacing with that +library. + One thing Perl porters should note is that F doesn't tend to use that much of the C standard library internally; you'll see very little use of, for example, the F functions in there. This is because Perl tends to reimplement or abstract standard library functions, so that we know exactly how they're going to operate. -This is a reference card for people who are familiar with the C library -and who want to do things the Perl way; to tell them which functions -they ought to use instead of the more normal C functions. +=head1 libc functions to avoid + +There are many many libc functions. Most of them are fair game to use, +but some are not. Some of the possible reasons are: + +=over + +=item * + +They likely will interfere with the perl interpreter's functioning, +such as its bookkeeping, or signal handling, or memory allocation, +or any number of harmful things. + +=item * + +They aren't implemented on all platforms, but there is an alternative +that is. + +Or they may be buggy or deprecated on some or all platforms. + +=item * + +They aren't suitable for multi-threaded operation, but there is an +alternative that is, and is just as easily usable. + +You may not expect your code to ever be used under threads, but code has +a way of being adapted beyond our initial expectations. If it is just +as easy to use something that can be used under threads, it's better to +use that now, just in case. + +=item * + +In functions that deal with strings, complications may arise because the +string may be encoded in different ways, for example in UTF-8. For +these, it is likely better to place the string in a SV and use the Perl +SV string handling functions that contain extensive logic to deal with +this. + +=item * + +In functions that deal with numbers, complications may arise because the +numbers get too big or small, and what those limits are depends on the +current platform. Again, the Perl SV numeric data types have extensive +logic to take care of these kinds of issues. + +=item * + +They are locale-aware, and your caller may not want this. + +=back + +The following commentary and tables give some functions in the first +column that shouldn't be used in C or XS code, with the preferred +alternative (if any) in the second column. =head2 Conventions @@ -20,6 +76,10 @@ In the following tables: =over 3 +=item C<~> + +marks the function as deprecated; it should not be used regardless. + =item C is a type. @@ -48,79 +108,80 @@ types. Don't forget that with the new PerlIO layered I/O abstraction C types may not even be available. See also the C documentation for more information about the following functions: - Instead Of: Use: + Instead Of: Use: - stdin PerlIO_stdin() - stdout PerlIO_stdout() - stderr PerlIO_stderr() + stdin PerlIO_stdin() + stdout PerlIO_stdout() + stderr PerlIO_stderr() - fopen(fn, mode) PerlIO_open(fn, mode) - freopen(fn, mode, stream) PerlIO_reopen(fn, mode, perlio) (Dep- - recated) - fflush(stream) PerlIO_flush(perlio) - fclose(stream) PerlIO_close(perlio) + fopen(fn, mode) PerlIO_open(fn, mode) + freopen(fn, mode, stream) PerlIO_reopen(fn, mode, perlio) (Dep- + recated) + fflush(stream) PerlIO_flush(perlio) + fclose(stream) PerlIO_close(perlio) =head2 File Input and Output - Instead Of: Use: + Instead Of: Use: - fprintf(stream, fmt, ...) PerlIO_printf(perlio, fmt, ...) + fprintf(stream, fmt, ...) PerlIO_printf(perlio, fmt, ...) - [f]getc(stream) PerlIO_getc(perlio) - [f]putc(stream, n) PerlIO_putc(perlio, n) - ungetc(n, stream) PerlIO_ungetc(perlio, n) + [f]getc(stream) PerlIO_getc(perlio) + [f]putc(stream, n) PerlIO_putc(perlio, n) + ungetc(n, stream) PerlIO_ungetc(perlio, n) Note that the PerlIO equivalents of C and C are slightly different from their C library counterparts: - fread(p, size, n, stream) PerlIO_read(perlio, buf, numbytes) - fwrite(p, size, n, stream) PerlIO_write(perlio, buf, numbytes) + fread(p, size, n, stream) PerlIO_read(perlio, buf, numbytes) + fwrite(p, size, n, stream) PerlIO_write(perlio, buf, numbytes) - fputs(s, stream) PerlIO_puts(perlio, s) + fputs(s, stream) PerlIO_puts(perlio, s) There is no equivalent to C; one should use C instead: - fgets(s, n, stream) sv_gets(sv, perlio, append) + fgets(s, n, stream) sv_gets(sv, perlio, append) =head2 File Positioning - Instead Of: Use: + Instead Of: Use: - feof(stream) PerlIO_eof(perlio) - fseek(stream, n, whence) PerlIO_seek(perlio, n, whence) - rewind(stream) PerlIO_rewind(perlio) + feof(stream) PerlIO_eof(perlio) + fseek(stream, n, whence) PerlIO_seek(perlio, n, whence) + rewind(stream) PerlIO_rewind(perlio) - fgetpos(stream, p) PerlIO_getpos(perlio, sv) - fsetpos(stream, p) PerlIO_setpos(perlio, sv) + fgetpos(stream, p) PerlIO_getpos(perlio, sv) + fsetpos(stream, p) PerlIO_setpos(perlio, sv) - ferror(stream) PerlIO_error(perlio) - clearerr(stream) PerlIO_clearerr(perlio) + ferror(stream) PerlIO_error(perlio) + clearerr(stream) PerlIO_clearerr(perlio) =head2 Memory Management and String Handling - Instead Of: Use: + Instead Of: Use: - t* p = malloc(n) Newx(p, n, t) - t* p = calloc(n, s) Newxz(p, n, t) - p = realloc(p, n) Renew(p, n, t) - memcpy(dst, src, n) Copy(src, dst, n, t) - memmove(dst, src, n) Move(src, dst, n, t) - memcpy(dst, src, sizeof(t)) StructCopy(src, dst, t) - memset(dst, 0, n * sizeof(t)) Zero(dst, n, t) - memzero(dst, 0) Zero(dst, n, char) - free(p) Safefree(p) + t* p = malloc(n) Newx(p, n, t) + t* p = calloc(n, s) Newxz(p, n, t) + p = realloc(p, n) Renew(p, n, t) + memcpy(dst, src, n) Copy(src, dst, n, t) + memmove(dst, src, n) Move(src, dst, n, t) + memcpy(dst, src, sizeof(t)) StructCopy(src, dst, t) + memset(dst, 0, n * sizeof(t)) Zero(dst, n, t) + memzero(dst, 0) Zero(dst, n, char) + free(p) Safefree(p) - strdup(p) savepv(p) - strndup(p, n) savepvn(p, n) (Hey, strndup doesn't - exist!) + strdup(p) savepv(p) + strndup(p, n) savepvn(p, n) (Hey, strndup doesn't + exist!) - strstr(big, little) instr(big, little) - strcmp(s1, s2) strLE(s1, s2) / strEQ(s1, s2) - / strGT(s1,s2) - strncmp(s1, s2, n) strnNE(s1, s2, n) / strnEQ(s1, s2, n) + strstr(big, little) instr(big, little) + memmem(big, blen, little, len) ninstr(big, bigend, little, little_end) + strcmp(s1, s2) strLE(s1, s2) / strEQ(s1, s2) + / strGT(s1,s2) + strncmp(s1, s2, n) strnNE(s1, s2, n) / strnEQ(s1, s2, n) - memcmp(p1, p2, n) memNE(p1, p2, n) - !memcmp(p1, p2, n) memEQ(p1, p2, n) + memcmp(p1, p2, n) memNE(p1, p2, n) + !memcmp(p1, p2, n) memEQ(p1, p2, n) Notice the different order of arguments to C and C than used in C and C. @@ -128,12 +189,19 @@ in C and C. Most of the time, though, you'll want to be dealing with SVs internally instead of raw C strings: - strlen(s) sv_len(sv) - strcpy(dt, src) sv_setpv(sv, s) - strncpy(dt, src, n) sv_setpvn(sv, s, n) - strcat(dt, src) sv_catpv(sv, s) - strncat(dt, src) sv_catpvn(sv, s) - sprintf(s, fmt, ...) sv_setpvf(sv, fmt, ...) + strlen(s) sv_len(sv) + strcpy(dt, src) sv_setpv(sv, s) + strncpy(dt, src, n) sv_setpvn(sv, s, n) + strcat(dt, src) sv_catpv(sv, s) + strncat(dt, src) sv_catpvn(sv, s) + sprintf(s, fmt, ...) sv_setpvf(sv, fmt, ...) + +If you do need raw strings, some platforms have safer interfaces, and +Perl makes sure a version of these are available on all platforms: + + strlcat(dt, src, sizeof(dt)) my_strlcat(dt, src, sizeof(dt)) + strlcpy(dt, src, sizeof(dt)) my_strlcpy(dt, src, sizeof(dt)) + strnlen(s) my_strnlen(s, maxlen) Note also the existence of C and C, combining concatenation with formatting. @@ -146,17 +214,14 @@ any code attempting to use the data without forethought will break sooner rather than later. Poisoning can be done using the Poison() macros, which have similar arguments to Zero(): - PoisonWith(dst, n, t, b) scribble memory with byte b - PoisonNew(dst, n, t) equal to PoisonWith(dst, n, t, 0xAB) - PoisonFree(dst, n, t) equal to PoisonWith(dst, n, t, 0xEF) - Poison(dst, n, t) equal to PoisonFree(dst, n, t) + PoisonWith(dst, n, t, b) scribble memory with byte b + PoisonNew(dst, n, t) equal to PoisonWith(dst, n, t, 0xAB) + PoisonFree(dst, n, t) equal to PoisonWith(dst, n, t, 0xEF) + Poison(dst, n, t) equal to PoisonFree(dst, n, t) =head2 Character Class Tests There are several types of character class tests that Perl implements. -The only ones described here are those that directly correspond to C -library functions that operate on 8-bit characters, but there are -equivalents that operate on wide characters, and UTF-8 encoded strings. All are more fully described in L and L. @@ -166,27 +231,35 @@ functionality. The other two columns always assume a POSIX (or C) locale. The entries in the ASCII column are only meaningful for ASCII inputs, returning FALSE for anything else. Use these only when you B that is what you want. The entries in the Latin1 column assume -that the non-ASCII 8-bit characters are as Unicode defines, them, the +that the non-ASCII 8-bit characters are as Unicode defines them, the same as ISO-8859-1, often called Latin 1. - Instead Of: Use for ASCII: Use for Latin1: Use for locale: + Instead Of: Use for ASCII: Use for Latin1: Use for locale: - isalnum(c) isALPHANUMERIC(c) isALPHANUMERIC_L1(c) isALPHANUMERIC_LC(c) - isalpha(c) isALPHA(c) isALPHA_L1(c) isALPHA_LC(u ) - isascii(c) isASCII(c) isASCII_LC(c) - isblank(c) isBLANK(c) isBLANK_L1(c) isBLANK_LC(c) - iscntrl(c) isCNTRL(c) isCNTRL_L1(c) isCNTRL_LC(c) - isdigit(c) isDIGIT(c) isDIGIT_L1(c) isDIGIT_LC(c) - isgraph(c) isGRAPH(c) isGRAPH_L1(c) isGRAPH_LC(c) - islower(c) isLOWER(c) isLOWER_L1(c) isLOWER_LC(c) - isprint(c) isPRINT(c) isPRINT_L1(c) isPRINT_LC(c) - ispunct(c) isPUNCT(c) isPUNCT_L1(c) isPUNCT_LC(c) - isspace(c) isSPACE(c) isSPACE_L1(c) isSPACE_LC(c) - isupper(c) isUPPER(c) isUPPER_L1(c) isUPPER_LC(c) - isxdigit(c) isXDIGIT(c) isXDIGIT_L1(c) isXDIGIT_LC(c) + isalnum(c) isALPHANUMERIC(c) isALPHANUMERIC_L1(c) isALPHANUMERIC_LC(c) + isalpha(c) isALPHA(c) isALPHA_L1(c) isALPHA_LC(u ) + isascii(c) isASCII(c) isASCII_LC(c) + isblank(c) isBLANK(c) isBLANK_L1(c) isBLANK_LC(c) + iscntrl(c) isCNTRL(c) isCNTRL_L1(c) isCNTRL_LC(c) + isdigit(c) isDIGIT(c) isDIGIT_L1(c) isDIGIT_LC(c) + isgraph(c) isGRAPH(c) isGRAPH_L1(c) isGRAPH_LC(c) + islower(c) isLOWER(c) isLOWER_L1(c) isLOWER_LC(c) + isprint(c) isPRINT(c) isPRINT_L1(c) isPRINT_LC(c) + ispunct(c) isPUNCT(c) isPUNCT_L1(c) isPUNCT_LC(c) + isspace(c) isSPACE(c) isSPACE_L1(c) isSPACE_LC(c) + isupper(c) isUPPER(c) isUPPER_L1(c) isUPPER_LC(c) + isxdigit(c) isXDIGIT(c) isXDIGIT_L1(c) isXDIGIT_LC(c) - tolower(c) toLOWER(c) toLOWER_L1(c) - toupper(c) toUPPER(c) + tolower(c) toLOWER(c) toLOWER_L1(c) + toupper(c) toUPPER(c) + +For the corresponding functions like C, I, use +C for non-locale; or C for locale. +And use C instead of C, I. There are +no direct equivalents for locale; best to put the string into an SV. + +Don't use any of the functions like C. Those are +non-portable, and interfere with Perl's internal handling. To emphasize that you are operating only on ASCII characters, you can append C<_A> to each of the macros in the ASCII column: C, @@ -198,60 +271,925 @@ latter name is clearer. There is no entry in the Latin1 column for C because the result can be non-Latin1. You have to use C, as described in L.) +Note that the libc caseless comparisons are crippled; Unicode +provides a richer set, using the concept of folding. If you need +more than equality/non-equality, it's probably best to store your +strings in an SV and use SV functions to do the comparision. Similarly +for collation. + =head2 F functions - Instead Of: Use: + Instead Of: Use: - atof(s) Atof(s) - atoi(s) grok_atoUV(s, &uv, &e) - atol(s) grok_atoUV(s, &uv, &e) - strtod(s, &p) Strtod(s, &p) - strtol(s, &p, n) Strtol(s, &p, b) - strtoul(s, &p, n) Strtoul(s, &p, b) + atof(s) my_atof(s) or Atof(s) + atoi(s) grok_atoUV(s, &uv, &e) + atol(s) grok_atoUV(s, &uv, &e) + strtod(s, &p) Strtod(s, &p) + strtol(s, &p, n) Strtol(s, &p, b) + strtoul(s, &p, n) Strtoul(s, &p, b) + +But note that these are subject to locale; see L. Typical use is to do range checks on C before casting: - int i; UV uv; - char* end_ptr = input_end; - if (grok_atoUV(input, &uv, &end_ptr) - && uv <= INT_MAX) - i = (int)uv; - ... /* continue parsing from end_ptr */ - } else { - ... /* parse error: not a decimal integer in range 0 .. MAX_IV */ - } + int i; UV uv; + char* end_ptr = input_end; + if (grok_atoUV(input, &uv, &end_ptr) + && uv <= INT_MAX) + i = (int)uv; + ... /* continue parsing from end_ptr */ + } else { + ... /* parse error: not a decimal integer in range 0 .. MAX_IV */ + } Notice also the C, C, and C functions in F for converting strings representing numbers in the respective bases into Cs. Note that grok_atoUV() doesn't handle negative inputs, or leading whitespace (being purposefully strict). -Note that strtol() and strtoul() may be disguised as Strtol(), Strtoul(), -Atol(), Atoul(). Avoid those, too. - -In theory C and C may not be defined if the machine perl is -built on doesn't actually have strtol and strtoul. But as those 2 -functions are part of the 1989 ANSI C spec we suspect you'll find them -everywhere by now. - - int rand() double Drand01() - srand(n) { seedDrand01((Rand_seed_t)n); - PL_srand_called = TRUE; } - - exit(n) my_exit(n) - system(s) Don't. Look at pp_system or use my_popen. - - getenv(s) PerlEnv_getenv(s) - setenv(s, val) my_setenv(s, val) - =head2 Miscellaneous functions You should not even B to use F functions, but if you think you do, use the C stack in F instead. -For C/C, use C. + ~asctime() Perl_sv_strftime_tm() + ~asctime_r() Perl_sv_strftime_tm() + chsize() my_chsize() + ~ctime() Perl_sv_strftime_tm() + ~ctime_r() Perl_sv_strftime_tm() + ~cuserid() DO NOT USE; see its man page + dirfd() my_dirfd() + duplocale() Perl_setlocale() + ~ecvt() my_snprintf() + ~endgrent_r() endgrent() + ~endhostent_r() endhostent() + ~endnetent_r() endnetent() + ~endprotoent_r() endprotoent() + ~endpwent_r() endpwent() + ~endservent_r() endservent() + ~endutent() endutxent() + exit(n) my_exit(n) + ~fcvt() my_snprintf() + freelocale() Perl_setlocale() + ~ftw() nftw() + getenv(s) PerlEnv_getenv(s) + ~gethostbyaddr() getaddrinfo() + ~gethostbyname() getnameinfo() + ~getpass() DO NOT USE; see its man page + ~getpw() getpwuid() + ~getutent() getutxent() + ~getutid() getutxid() + ~getutline() getutxline() + ~gsignal() DO NOT USE; see its man page + localeconv() Perl_localeconv() + mblen() mbrlen() + mbtowc() mbrtowc() + newlocale() Perl_setlocale() + pclose() my_pclose() + popen() my_popen() + ~pututline() pututxline() + ~qecvt() my_snprintf() + ~qfcvt() my_snprintf() + querylocale() Perl_setlocale() + int rand() double Drand01() + srand(n) { seedDrand01((Rand_seed_t)n); + PL_srand_called = TRUE; } + ~readdir_r() readdir() + realloc() saferealloc(), Renew() or Renewc() + ~re_comp() regcomp() + ~re_exec() regexec() + ~rexec() rcmd() + ~rexec_af() rcmd() + setenv(s, val) my_setenv(s, val) + ~setgrent_r() setgrent() + ~sethostent_r() sethostent() + setlocale() Perl_setlocale() + setlocale_r() Perl_setlocale() + ~setnetent_r() setnetent() + ~setprotoent_r() setprotoent() + ~setpwent_r() setpwent() + ~setservent_r() setservent() + ~setutent() setutxent() + sigaction() rsignal(signo, handler) + ~siginterrupt() rsignal() with the SA_RESTART flag instead + signal(signo, handler) rsignal(signo, handler) + ~ssignal() DO NOT USE; see its man page + strcasecmp() a Perl foldEQ-family function + strerror() sv_string_from_errnum() + strerror_l() sv_string_from_errnum() + strerror_r() sv_string_from_errnum() + strftime() Perl_sv_strftime_tm() + strtod() my_strtod() or Strtod() + system(s) Don't. Look at pp_system or use my_popen. + ~tempnam() mkstemp() or tmpfile() + ~tmpnam() mkstemp() or tmpfile() + tmpnam_r() mkstemp() or tmpfile() + uselocale() Perl_setlocale() + vsnprintf() my_vsnprintf() + wctob() wcrtomb() + wctomb() wcrtomb() + wsetlocale() Perl_setlocale() + +The Perl-furnished alternatives are documented in L, which you +should peruse anyway to see what all is available to you. + +The lists are incomplete. Think when using an unlisted function if it +seems likely to interfere with Perl. + +=head1 Dealing with locales + +Like it or not, your code will be executed in the context of a locale, +as are all C language programs. See L. Most libc calls are +not affected by the locale, but a surprising number are: + + addmntent() getspent_r() sethostent() + alphasort() getspnam() sethostent_r() + asctime() getspnam_r() setnetent() + asctime_r() getwc() setnetent_r() + asprintf() getwchar() setnetgrent() + atof() glob() setprotoent() + atoi() gmtime() setprotoent_r() + atol() gmtime_r() setpwent() + atoll() grantpt() setpwent_r() + btowc() iconv_open() setrpcent() + catopen() inet_addr() setservent() + ctime() inet_aton() setservent_r() + ctime_r() inet_network() setspent() + cuserid() inet_ntoa() sgetspent_r() + daylight inet_ntop() shm_open() + dirname() inet_pton() shm_unlink() + dprintf() initgroups() snprintf() + endaliasent() innetgr() sprintf() + endgrent() iruserok() sscanf() + endgrent_r() iruserok_af() strcasecmp() + endhostent() isalnum() strcasestr() + endhostent_r() isalnum_l() strcoll() + endnetent() isalpha() strerror() + endnetent_r() isalpha_l() strerror_l() + endprotoent() isascii() strerror_r() + endprotoent_r() isascii_l() strfmon() + endpwent() isblank() strfmon_l() + endpwent_r() isblank_l() strfromd() + endrpcent() iscntrl() strfromf() + endservent() iscntrl_l() strfroml() + endservent_r() isdigit() strftime() + endspent() isdigit_l() strftime_l() + err() isgraph() strncasecmp() + error() isgraph_l() strptime() + error_at_line() islower() strsignal() + errx() islower_l() strtod() + fgetwc() isprint() strtof() + fgetwc_unlocked() isprint_l() strtoimax() + fgetws() ispunct() strtol() + fgetws_unlocked() ispunct_l() strtold() + fnmatch() isspace() strtoll() + forkpty() isspace_l() strtoq() + fprintf() isupper() strtoul() + fputwc() isupper_l() strtoull() + fputwc_unlocked() iswalnum() strtoumax() + fputws() iswalnum_l() strtouq() + fputws_unlocked() iswalpha() strverscmp() + fscanf() iswalpha_l() strxfrm() + fwprintf() iswblank() swprintf() + fwscanf() iswblank_l() swscanf() + getaddrinfo() iswcntrl() syslog() + getaliasbyname_r() iswcntrl_l() timegm() + getaliasent_r() iswdigit() timelocal() + getdate() iswdigit_l() timezone + getdate_r() iswgraph() tolower() + getfsent() iswgraph_l() tolower_l() + getfsfile() iswlower() toupper() + getfsspec() iswlower_l() toupper_l() + getgrent() iswprint() towctrans() + getgrent_r() iswprint_l() towlower() + getgrgid() iswpunct() towlower_l() + getgrgid_r() iswpunct_l() towupper() + getgrnam() iswspace() towupper_l() + getgrnam_r() iswspace_l() tzname + getgrouplist() iswupper() tzset() + gethostbyaddr() iswupper_l() ungetwc() + gethostbyaddr_r() iswxdigit() vasprintf() + gethostbyname() iswxdigit_l() vdprintf() + gethostbyname2() isxdigit() verr() + gethostbyname2_r() isxdigit_l() verrx() + gethostbyname_r() localeconv() versionsort() + gethostent() localtime() vfprintf() + gethostent_r() localtime_r() vfscanf() + gethostid() MB_CUR_MAX vfwprintf() + getlogin() mblen() vprintf() + getlogin_r() mbrlen() vscanf() + getmntent() mbrtowc() vsnprintf() + getmntent_r() mbsinit() vsprintf() + getnameinfo() mbsnrtowcs() vsscanf() + getnetbyaddr() mbsrtowcs() vswprintf() + getnetbyaddr_r() mbstowcs() vsyslog() + getnetbyname() mbtowc() vwarn() + getnetbyname_r() mktime() vwarnx() + getnetent() nan() vwprintf() + getnetent_r() nanf() warn() + getnetgrent() nanl() warnx() + getnetgrent_r() nl_langinfo() wcrtomb() + getprotobyname() openpty() wcscasecmp() + getprotobyname_r() printf() wcschr() + getprotobynumber() psiginfo() wcscoll() + getprotobynumber_r() psignal() wcsftime() + getprotoent() putpwent() wcsncasecmp() + getprotoent_r() putspent() wcsnrtombs() + getpw() putwc() wcsrchr() + getpwent() putwchar() wcsrtombs() + getpwent_r() regcomp() wcstod() + getpwnam() regexec() wcstof() + getpwnam_r() res_nclose() wcstoimax() + getpwuid() res_ninit() wcstold() + getpwuid_r() res_nquery() wcstombs() + getrpcbyname_r() res_nquerydomain() wcstoumax() + getrpcbynumber_r() res_nsearch() wcswidth() + getrpcent_r() res_nsend() wcsxfrm() + getrpcport() rpmatch() wctob() + getservbyname() ruserok() wctomb() + getservbyname_r() ruserok_af() wctrans() + getservbyport() scandir() wctype() + getservbyport_r() scanf() wcwidth() + getservent() setaliasent() wordexp() + getservent_r() setgrent() wprintf() + getspent() setgrent_r() wscanf() + +(The list doesn't include functions that manipulate the locale, such as +C.) + +If any of these functions are called directly or indirectly from your +code, you are affected by the current locale. + +The first thing to know about this list is that there are better +alternatives to many of the functions, which it's highly likely that you +should be using instead. See L above. +This includes using Perl IO L. + +The second thing to know is that Perl is documented to not pay attention +to the current locale except for code executed within the scope of a +S> statement. If you violate that, you may be creating +bugs, depending on the application. + +The next thing to know is that many of these functions depend only on +the locale in regards to numeric values. Your code is likely to have +been written expecting that the decimal point (radix) character is a dot +(U+002E: FULL STOP), and that strings of integer numbers are not +separated into groups (1,000,000 in an American locale means a million; +your code is likely not expecting the commas.) The good news is that +normally (as of Perl v5.22), your code will get called with the locale +set so those expectations are met. Explicit action has to be taken to +change this (described a little ways below). This is accomplished by +Perl not actually switching into a locale that doesn't conform to these +expectations, except when explicitly told to do so. The Perl +input/output and formatting routines do this switching for you +automatically, if appropriate, and then switch back. If, for some +reason, you need to do it yourself, the easiest way from C and XS code +is to use the macro L>. You +can wrap this macro around an entire block of code that you want to be +executed in the correct environment. The bottom line is that your code +is likely to work as expected in this regard without you having to take +any action. + +This leaves the remaining functions. Your code will get called with all +but the numeric locale portions set to the underlying locale. Often, +the locale is of not much import to your code, and you also won't have +to take any action; things will just work out. But you should examine +the man pages of the ones you use to verify this. Often, Perl has +better ways of doing the same functionality. Consider using SVs and +their access routines rather than calling the low level functions that, +for example, find how many bytes are in a UTF-8 encoded character. + +You can determine if you have been called from within the scope of a +S> by using the boolen macro L>. + +If you need to not be in the underlying locale, you can call +L> to change it temporarily to the one you +need (likely the "C" locale), and then change it back before returning. +This can be B problematic on threaded perls on some platforms. See +L. + +A problem with changing the locale of a single category is that mojibake +can arise on some platforms if the C category and the changed one +are not the same. On the platforms that that isn't an issue, the +preprocessor directive C will be defined. +Otherwise, you may have to change more than one category to correctly +accomplish your task. And, there will be many locale combinations where +the mojibake likely won't happen, so you won't be confronted with this +until the code gets executed in the field by someone who doesn't speak +your language very well. + +Earlier we mentioned that explicit action is required to have your code +get called with the numeric portions of the locale not meeting the +the typical expectations of having a dot for the radix character and no +punctuation separating groups of digits. That action is to call the +function L>. + +C was written initially to cope with the +C library, but is general enough for other similar situations. C +changes the global locale to match its expectations (later versions of +it allow this to be turned off). This presents a conflict with Perl +thinking it also controls the locale. Calling this function tells Perl to +yield control. Calling L> tells Perl to take +control again, accepting whatever the locale has been changed to in the +interim. If your code is called during that interim, all portions of +the locale will be the raw underlying values. Should you need to +manipulate numbers, you are on your own with regard to the radix +character and grouping. If you find yourself in this situation, it is +generally best to make the interval between the calls to these two +functions as short as possible, and avoid calculations until after perl +has control again. + +It is important for perl to know about all the possible locale +categories on the platform, even if they aren't apparently used in your +program. Perl knows all of the Linux ones. If your platform has +others, you can submit an issue at +L for inclusion of it in the next +release. In the meantime, it is possible to edit the Perl source to +teach it about the category, and then recompile. Search for instances +of, say, C in the source, and use that as a template to add +the omitted one. + +There are further complications under multi-threaded operation. Keep on +reading. + +=head1 Dealing with embedded perls and threads + +It is possible to embed a Perl interpreter within a larger program. See +L. + +MULTIPLICITY is the way this is accomplished internally; it is described in +L. +Multiple Perl interpreters may be embedded. + +It is also possible to compile perl to support threading. See +L. Perl's implementation of threading requires +MULTIPLICITY, but not the other way around. + +MULTIPLICITY without threading means that only one thing runs at a time, +so there are no concurrency issues, but each component or instance can +affect the global state, potentially interfering with the execution of +other instance. This can happen if one instance: + +=over + +=item * + +changes the current working directory + +=item * + +changes the process's environment + +=item * + +changes the global locale the process is operating under + +=item * + +writes to shared memory or to a shared file + +=item * + +uses a shared file descriptor (including a database iterator) + +=item * + +raises a signal that functions in other instances are sensitive to + +=back + +If your code doesn't do any of these things, nor depends on any of their +values, then Congratulations!!, you don't have to worry about MULTIPLICITY +or threading. But wait, a surprising number of libc functions do +depend on data global to the process in some way that may not be +immediately obvious. For example, calling C> changes the +global state of a process, and thus needs special attention. + +The section 3 libc uses that we know about that have MULTIPLICITY and/or +multi-thread issues are: + + addmntent() getrpcent_r() re_exec() + alphasort() getrpcport() regcomp() + asctime() getservbyname() regerror() + asctime_r() getservbyname_r() regexec() + asprintf() getservbyport() res_nclose() + atof() getservbyport_r() res_ninit() + atoi() getservent() res_nquery() + atol() getservent_r() res_nquerydomain() + atoll() getspent() res_nsearch() + basename() getspent_r() res_nsend() + btowc() getspnam() rexec() + catgets() getspnam_r() rexec_af() + catopen() getttyent() rpmatch() + clearenv() getttynam() ruserok() + clearerr_unlocked() getusershell() ruserok_af() + crypt() getutent() scandir() + crypt_gensalt() getutid() scanf() + crypt_r() getutline() secure_getenv() + ctermid() getutxent() seed48() + ctermid_r() getutxid() seed48_r() + ctime() getutxline() setaliasent() + ctime_r() getwc() setcontext() + cuserid() getwchar() setenv() + daylight getwchar_unlocked() setfsent() + dbm_clearerr() getwc_unlocked() setgrent() + dbm_close() glob() setgrent_r() + dbm_delete() gmtime() sethostent() + dbm_error() gmtime_r() sethostent_r() + dbm_fetch() grantpt() sethostid() + dbm_firstkey() hcreate() setkey() + dbm_nextkey() hcreate_r() setlocale() + dbm_open() hdestroy() setlocale_r() + dbm_store() hdestroy_r() setlogmask() + dirname() hsearch() setnetent() + dlerror() hsearch_r() setnetent_r() + dprintf() iconv() setnetgrent() + drand48() iconv_open() setprotoent() + drand48_r() inet_addr() setprotoent_r() + ecvt() inet_aton() setpwent() + encrypt() inet_network() setpwent_r() + endaliasent() inet_ntoa() setrpcent() + endfsent() inet_ntop() setservent() + endgrent() inet_pton() setservent_r() + endgrent_r() initgroups() setspent() + endhostent() initstate_r() setstate_r() + endhostent_r() innetgr() setttyent() + endnetent() iruserok() setusershell() + endnetent_r() iruserok_af() setutent() + endnetgrent() isalnum() setutxent() + endprotoent() isalnum_l() sgetspent() + endprotoent_r() isalpha() sgetspent_r() + endpwent() isalpha_l() shm_open() + endpwent_r() isascii() shm_unlink() + endrpcent() isascii_l() siginterrupt() + endservent() isblank() sleep() + endservent_r() isblank_l() snprintf() + endspent() iscntrl() sprintf() + endttyent() iscntrl_l() srand48() + endusershell() isdigit() srand48_r() + endutent() isdigit_l() srandom_r() + endutxent() isgraph() sscanf() + erand48() isgraph_l() ssignal() + erand48_r() islower() strcasecmp() + err() islower_l() strcasestr() + error() isprint() strcoll() + error_at_line() isprint_l() strerror() + errx() ispunct() strerror_l() + ether_aton() ispunct_l() strerror_r() + ether_ntoa() isspace() strfmon() + execlp() isspace_l() strfmon_l() + execvp() isupper() strfromd() + execvpe() isupper_l() strfromf() + exit() iswalnum() strfroml() + __fbufsize() iswalnum_l() strftime() + fcloseall() iswalpha() strftime_l() + fcvt() iswalpha_l() strncasecmp() + fflush_unlocked() iswblank() strptime() + fgetc_unlocked() iswblank_l() strsignal() + fgetgrent() iswcntrl() strtod() + fgetpwent() iswcntrl_l() strtof() + fgetspent() iswdigit() strtoimax() + fgets_unlocked() iswdigit_l() strtok() + fgetwc() iswgraph() strtol() + fgetwc_unlocked() iswgraph_l() strtold() + fgetws() iswlower() strtoll() + fgetws_unlocked() iswlower_l() strtoq() + fnmatch() iswprint() strtoul() + forkpty() iswprint_l() strtoull() + __fpending() iswpunct() strtoumax() + fprintf() iswpunct_l() strtouq() + __fpurge() iswspace() strverscmp() + fputc_unlocked() iswspace_l() strxfrm() + fputs_unlocked() iswupper() swapcontext() + fputwc() iswupper_l() swprintf() + fputwc_unlocked() iswxdigit() swscanf() + fputws() iswxdigit_l() sysconf() + fputws_unlocked() isxdigit() syslog() + fread_unlocked() isxdigit_l() system() + fscanf() jrand48() tdelete() + __fsetlocking() jrand48_r() tempnam() + fts_children() l64a() tfind() + fts_read() lcong48() timegm() + ftw() lcong48_r() timelocal() + fwprintf() lgamma() timezone + fwrite_unlocked() lgammaf() tmpnam() + fwscanf() lgammal() tmpnam_r() + gamma() localeconv() tolower() + gammaf() localtime() tolower_l() + gammal() localtime_r() toupper() + getaddrinfo() login() toupper_l() + getaliasbyname() login_tty() towctrans() + getaliasbyname_r() logout() towlower() + getaliasent() logwtmp() towlower_l() + getaliasent_r() lrand48() towupper() + getchar_unlocked() lrand48_r() towupper_l() + getcontext() makecontext() tsearch() + getc_unlocked() mallinfo() ttyname() + get_current_dir_name() MB_CUR_MAX ttyname_r() + getdate() mblen() ttyslot() + getdate_r() mbrlen() twalk() + getenv() mbrtowc() twalk_r() + getfsent() mbsinit() tzname + getfsfile() mbsnrtowcs() tzset() + getfsspec() mbsrtowcs() ungetwc() + getgrent() mbstowcs() unsetenv() + getgrent_r() mbtowc() updwtmp() + getgrgid() mcheck() utmpname() + getgrgid_r() mcheck_check_all() va_arg() + getgrnam() mcheck_pedantic() valloc() + getgrnam_r() mktime() vasprintf() + getgrouplist() mprobe() vdprintf() + gethostbyaddr() mrand48() verr() + gethostbyaddr_r() mrand48_r() verrx() + gethostbyname() mtrace() versionsort() + gethostbyname2() muntrace() vfprintf() + gethostbyname2_r() nan() vfscanf() + gethostbyname_r() nanf() vfwprintf() + gethostent() nanl() vprintf() + gethostent_r() newlocale() vscanf() + gethostid() nftw() vsnprintf() + getlogin() nl_langinfo() vsprintf() + getlogin_r() nrand48() vsscanf() + getmntent() nrand48_r() vswprintf() + getmntent_r() openpty() vsyslog() + getnameinfo() perror() vwarn() + getnetbyaddr() posix_fallocate() vwarnx() + getnetbyaddr_r() printf() vwprintf() + getnetbyname() profil() warn() + getnetbyname_r() psiginfo() warnx() + getnetent() psignal() wcrtomb() + getnetent_r() ptsname() wcscasecmp() + getnetgrent() putchar_unlocked() wcschr() + getnetgrent_r() putc_unlocked() wcscoll() + getopt() putenv() wcsftime() + getopt_long() putpwent() wcsncasecmp() + getopt_long_only() putspent() wcsnrtombs() + getpass() pututline() wcsrchr() + getprotobyname() pututxline() wcsrtombs() + getprotobyname_r() putwc() wcstod() + getprotobynumber() putwchar() wcstof() + getprotobynumber_r() putwchar_unlocked() wcstoimax() + getprotoent() putwc_unlocked() wcstold() + getprotoent_r() pvalloc() wcstombs() + getpw() qecvt() wcstoumax() + getpwent() qfcvt() wcswidth() + getpwent_r() querylocale() wcsxfrm() + getpwnam() rand() wctob() + getpwnam_r() random_r() wctomb() + getpwuid() rcmd() wctrans() + getpwuid_r() rcmd_af() wctype() + getrpcbyname() readdir() wcwidth() + getrpcbyname_r() readdir64() wordexp() + getrpcbynumber() readdir64_r() wprintf() + getrpcbynumber_r() readdir_r() wscanf() + getrpcent() re_comp() wsetlocale() + +(If you know of additional functions that are unsafe on some platform or +another, notify us via filing a bug report at +L.) + +Some of these are safe under MULTIPLICITY, problematic only under threading. +If a use doesn't appear in the above list, we think it is MULTIPLICITY +and thread-safe on all platforms. + +All the uses listed above are function calls, except for these: + + daylight MB_CUR_MAX timezone tzname + +There are three main approaches to coping with issues involving these +constructs, each suitable for different circumstances: + +=over + +=item * + +Don't use them. Some of them have preferred alternatives. Use the list +above in L to replace your uses with ones +that are thread-friendly. For example I/O, should be done via +L. + +If you must use them, many, but not all, of them will be ok as long as +their use is confined to a single thread that has no interaction with +conflicting uses in other threads. You will need to closely examine +their man pages for this, and be aware that vendor documentation is +often imprecise. + +=item * + +Do all your business before any other code can change things. If you +make changes, change back before returning. + +=item * + +Save the result of a query of global information to a per-instance area +before allowing another instance to execute. Then you can work on it at +your leisure. This might be an automatic C variable for non-pointers, +or something as described above in +C>. + +=back + +Without threading, you don't have to worry about being interrupted by +the system giving control to another thread. With threading, you will +have to uses mutexes, and be concerned with the possibility of deadlock. + +=head2 Functions always unsuitable for use under multi-threads + +A few functions are considered totally unsuited for use in a multi-thread +environment. These must be called only during single-thread operation. + + endusershell() @getaliasent() muntrace() rexec() + ether_aton() @getrpcbyname() profil() rexec_af() + ether_ntoa() @getrpcbynumber() rcmd() setusershell() + fts_children() @getrpcent() rcmd_af() ttyslot() + fts_read() getusershell() re_comp() + @getaliasbyname() mtrace() re_exec() + +C<@> above marks the functions for which there are preferred alternatives +available on some platforms, and those alternatives may be suitable for +multi-thread use. + +=head2 Functions which must be called at least once before starting threads + +Some functions perform initialization on their first call that must be done +while still in a single-thread environment, but subsequent calls are +thread-safe when executed in a critical section. +Therefore, they must be called at least once before switching to +multi-threads: + + getutent() getutline() getutxid() mallinfo() valloc() + getutid() getutxent() getutxline() pvalloc() + +=head2 Functions that are thread-safe when called with appropriate arguments + +Some of the functions are thread-safe if called with arguments that +comply with certain (easily met) restrictions. These are: + + ctermid() mbrlen() mbsrtowcs() wcrtomb() + cuserid() mbrtowc() tmpnam() wcsnrtombs() + error_at_line() mbsnrtowcs() va_arg() wcsrtombs() + +See the man pages of each for details. (For completeness, the list +includes functions that you shouldn't be using anyway because of other +reasons.) + +=head2 Functions vulnerable to signals + +Some functions are vulnerable to asynchronous signals. These are: + + getlogin() getutid() getutxid() login() pututline() updwtmp() + getlogin_r() getutline() getutxline() logout() pututxline() wordexp() + getutent() getutxent() glob() logwtmp() sleep() + +Some libc's implement 'system()' thread-safely. But in others, it also +has signal issues. + +=head2 General issues with thread-safety + +Some libc functions use and/or modify a global state, such as a database. +The libc functions presume that there is only one instance at a time +operating on that database. Unpredictable results occur if more than one +does, even if the database is not changed. For example, typically there is +a global iterator for such a data base and that iterator is maintained by +libc, so that each new read from any instance advances it, meaning that no +instance will see all the entries. The only way to make these thread-safe +is to have an exclusive lock on a mutex from the open call through the +close. You are advised to not use such databases from more than one +instance at a time. + +Other examples of functions that use a global state include pseudo-random +number generators. Some libc implementations of 'rand()', for example, may +share the data across threads; and others may have per-thread data. The +shared ones will have unreproducible results, as the threads will vary in +their timings and interactions. This may be what you want; or it may not +be. (This particular function is a candidate to be removed from the POSIX +Standard because of these issues.) + +Functions that output to a stream also are considered thread-unsafe when +locking is not done. But the typical consequences are just that the data +is output in an unpredictable order; that outcome may be totally +acceptable to you. + +Since the current working directory is global to a process, all +instances depend on it. One instance doing a chdir(2) affects all the +other instances. In a multi-threaded environment, any libc call that +expects the directory to not change for the duration of its execution +will have undefined results if another thread interrupts it at just the +wrong time and changes the directory. The man pages only list one such +call, nftw(). But there may be other issues lurking. + +=head2 Reentrant equivalent functions + +Some functions that are problematic with regard to MULTIPLICITY have +reentrant versions (on some or all platforms) that are better suited, +with fewer (perhaps no) races when run under threads. + +Some of these reentrant functions that are available on all platforms +should always be used anyway; they are in the lists directly under +L. + +Others may not be available on some platforms, or have issues that makes +them undesirable to use even when they are available. Or it may just be +more complicated and tedious to use the reentrant version. For these, +perl has a mechanism for automatically substituting that reentrant +version when available and desirable, while hiding the complications +from your code. This feature is enabled by default for code in the Perl +core and its extensions. To enable it in other XS modules, + + #define PERL_REENTRANT + +It is simpler for you to use the unpreferred version in your code, and +rely on this feature to do the better thing, in part because no +substitution is done if the alternative is not available or desirable on +the platform, nor if threads aren't enabled. You just write as if there +weren't threads, and you get the better behavior without having to think +about it. + +On some platforms the safer library functions may fail if the result +buffer is too small (for example the user group databases may be rather +large, and the reentrant functions may have to carry around a full +snapshot of those databases). Perl will start with a small buffer, but +keep retrying and growing the result buffer until the result fits. If +this limitless growing sounds bad for security or memory consumption +reasons you can recompile Perl with C #defined +to the maximum number of bytes you will allow. + +Below is a list of the non-reentrant functions and their reentrant +alternatives. This substitution is done even on functions that you +shouldn't be using in the first place. These are marked by a C<*>. You +should instead use the alternate given in the lists directly under +L. + +Even so, some of the preferred alternatives are considered obsolete or +otherwise unwise to use on some platforms. These are marked with a '?'. +Also, some alternatives aren't Perl-defined functions and aren't in in +the POSIX Standard, so won't be widely available. These are marked with +'~'. (Remember that the automatic substitution only happens when they +are available and desirable, so you can just use the unpreferred +alternative.) + + *asctime() ?asctime_r() + crypt() ~crypt_r() + ctermid() ~ctermid_r() + *ctime() ?ctime_r() + endgrent() ?~endgrent_r() + endhostent() ?~endhostent_r() + endnetent() ?~endnetent_r() + endprotoent() ?~endprotoent_r() + endpwent() ?~endpwent_r() + endservent() ?~endservent_r() + getgrent() ~getgrent_r() + getgrgid() getgrgid_r() + getgrnam() getgrnam_r() + gethostbyaddr() ~gethostbyaddr_r() + gethostbyname() ~gethostbyname_r() + gethostent() ~gethostent_r() + getlogin() getlogin_r() + getnetbyaddr() ~getnetbyaddr_r() + getnetbyname() ~getnetbyname_r() + getnetent() ~getnetent_r() + getprotobyname() ~getprotobyname_r() + getprotobynumber() ~getprotobynumber_r() + getprotoent() ~getprotoent_r() + getpwent() ~getpwent_r() + getpwnam() getpwnam_r() + getpwuid() getpwuid_r() + getservbyname() ~getservbyname_r() + getservbyport() ~getservbyport_r() + getservent() ~getservent_r() + getspnam() ~getspnam_r() + gmtime() gmtime_r() + localtime() localtime_r() + readdir() ?readdir_r() + readdir64() ~readdir64_r() + setgrent() ?~setgrent_r() + sethostent() ?~sethostent_r() + *setlocale() ?~setlocale_r() + setnetent() ?~setnetent_r() + setprotoent() ?~setprotoent_r() + setpwent() ?~setpwent_r() + setservent() ?~setservent_r() + *strerror() strerror_r() + *tmpnam() ~tmpnam_r() + ttyname() ttyname_r() + +The Perl-furnished items are documented in perlapi. + +The bottom line is: + +=over + +=item For items marked C<*> + +Replace all uses of these with the preferred alternative given in the +lists directly under L. + +=item For the remaining items + +If you really need to use these functions, you have two choices: + +=over + +=item If you #define PERL_REENTRANT + +Use the function in the first column as-is, and let perl do the work of +substituting the function in the right column if available on the +platform, and it is deemed suitable for use. + +You should look at the man pages for both versions to find any other +gotchas. + +=item If you don't enable automatic substitution + +You should examine the application's code to determine if the column 1 +function presents a real problem under threads given the circumstances +it is used in. You can go directly to the column 2 replacement, but +beware of the ones that are marked. Some of those may be nonexistent or +flaky on some platforms. + +=back + +=back + +=head2 Functions that need the environment to be constant + +Since the environment is global to a process, all instances depend on +it. One instance changing the environment affects all the other +instances. Under threads, any libc call that expects the environment to +not change for the duration of its execution will have undefined results +if another thread interrupts it at just the wrong time and changes it. +These are the functions that the man pages list as being sensitive to +that. + + catopen() gethostbyname2() newlocale() + ctime() gethostbyname2_r() regerror() + ctime_r() gethostbyname_r() secure_getenv() + endhostent() gethostent() sethostent() + endhostent_r() gethostent_r() sethostent_r() + endnetent() gethostid() setlocale() + endnetent_r() getnameinfo() setlocale_r() + execlp() getnetbyname() setnetent() + execvp() getnetent() setnetent_r() + execvpe() getopt() strftime() + fnmatch() getopt_long() strptime() + getaddrinfo() getopt_long_only() sysconf() + get_current_dir_name() getrpcport() syslog() + getdate() glob() tempnam() + getdate_r() gmtime() timegm() + getenv() gmtime_r() timelocal() + gethostbyaddr() localtime() tzset() + gethostbyaddr_r() localtime_r() vsyslog() + gethostbyname() mktime() + +Many of these functions are problematic under threads for other reasons +as well. See the man pages for any you use. + +Perl defines mutexes C and C with which +to wrap calls to these functions. You need to consider the possibility +of deadlock. It is expected that a different mechanism will be in place +and preferred for Perl v5.42. + +=head2 Locale-specific issues + +C language programs originally had a single locale global to the entire +process. This was later found to be inadequate for many purposes, so later +extensions changed that, first with Windows, and then POSIX 2008. In +Windows, you can change any thread at any time to operate either with a +per-thread locale, or with the global one, using a special new libc +function. In POSIX, the original API operates only on the global +locale, but there is an entirely new API to manipulate either per-thread +locales or the global one. As with Windows (but using the new API), a +thread can be switched at any time to operate on the global locale, or a +per-thread one. + +When one instance changes the global locale, all other instances using +the global locale are affected. Almost all the locale-related functions +in the list directly under L +have undefined behavior if another thread interrupts their execution and +changes the locale. Under threads, another thread could do exactly that. + +But, on systems that have per-thread locales, starting with Perl v5.28, +perl uses them after initialization; the global locale is not used +except if XS code has called C. Doing so +affects only the thread that called it. If a maximum of one instance is +using the global locale, no other instances are affected, the locale of +concurrently executing functions in other threads is not changed, and +this becomes a non-issue. The C preprocessor symbol +C will be defined if per-thread locales are +available and perl has been compiled to use them. The implementation of +per-thread locales on some platforms, like most *BSD-based ones, is so +buggy that the perl hints files for them deliberately turn off the +possibility of using them. + +The converse is that on systems with only a global locale, having +different threads using different locales is not likely to work well; +and changing the locale is dangerous, often leading to crashes. + +Perl has extensive code to work as well as possible on both types of +systems. You should always use C to change and query +the locale, as it portably works across the range of possibilities. =head1 SEE ALSO -L, L, L +L, L, L, L diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 8899ca6dd0..83f012468c 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -736,6 +736,18 @@ C =back +=head3 L + +=over 4 + +=item * + +Extensive guidance has been added for interfacing with the standard C +library, including many more functions to avoid, and how to cope with +locales and threads. + +=back + =head3 L =over 4 diff --git a/pod/perlembed.pod b/pod/perlembed.pod index fe5ac14b20..520da51fab 100644 --- a/pod/perlembed.pod +++ b/pod/perlembed.pod @@ -12,7 +12,7 @@ Do you want to: =item B -Read L, L, L, L, and L. +Read L, L, L, L, L, and L. =item B @@ -938,7 +938,7 @@ C<-Dusemultiplicity> option otherwise some interpreter variables may not be initialized correctly between consecutive runs and your application may crash. -See also L. +See also L. Using C<-Dusethreads -Duseithreads> rather than C<-Dusemultiplicity> is more appropriate if you intend to run multiple interpreters @@ -1091,7 +1091,7 @@ B can also automate writing the I glue code. % cc -c interp.c `perl -MExtUtils::Embed -e ccopts` % cc -o interp perlxsi.o interp.o `perl -MExtUtils::Embed -e ldopts` -Consult L, L, and L for more details. +Consult L, L, L, and L for more details. =head2 Using embedded Perl with POSIX locales diff --git a/pod/perlguts.pod b/pod/perlguts.pod index f924bc6d8a..bcb7e8c948 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -2909,6 +2909,8 @@ to be a no-op. =head2 How do I use all this in extensions? +See also L. + When Perl is built with MULTIPLICITY, extensions that call any functions in the Perl API will need to pass the initial context argument somehow. The kicker is that you will need to write it in diff --git a/pod/perllocale.pod b/pod/perllocale.pod index 543455e842..498cc4f2c9 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -251,7 +251,8 @@ This applies as well to L. XS modules for all categories but C get the underlying locale, and hence any C library functions they call will use that -underlying locale. For more discussion, see L. +underlying locale. For more discussion, see +L. =back @@ -577,7 +578,7 @@ automatically use their thread's locale. This should be completely transparent to any applications written entirely in Perl (minus a few rarely encountered caveats given in the L section). Information for XS module writers is given -in L. +in L. =head2 Finding locales @@ -1747,7 +1748,7 @@ You should not change the locale after startup on a platform where C<${^SAFE_LOCALES}> is 0. It will always be 1 on an unthreaded platform. -XS writers should refer to L. +XS writers should refer to L. =head2 Broken systems diff --git a/pod/perlthrtut.pod b/pod/perlthrtut.pod index c34f6760e9..70ef769f5d 100644 --- a/pod/perlthrtut.pod +++ b/pod/perlthrtut.pod @@ -1046,27 +1046,10 @@ threads. See L for more details.) =head1 Thread-Safety of System Libraries -Whether various library calls are thread-safe is outside the control -of Perl. Calls often suffering from not being thread-safe include: -C, C, functions fetching user, group and -network information (such as C, C, -C and so on), C, C, and C. In -general, calls that depend on some global external state. - -If the system Perl is compiled in has thread-safe variants of such -calls, they will be used. Beyond that, Perl is at the mercy of -the thread-safety or -unsafety of the calls. Please consult your -C library call documentation. - -On some platforms the thread-safe library interfaces may fail if the -result buffer is too small (for example the user group databases may -be rather large, and the reentrant interfaces may have to carry around -a full snapshot of those databases). Perl will start with a small -buffer, but keep retrying and growing the result buffer -until the result fits. If this limitless growing sounds bad for -security or memory consumption reasons you can recompile Perl with -C defined to the maximum number of bytes you will -allow. +Whether C library calls are thread-safe is outside the control of Perl. +Undefined behavior will happen if unsafe ones are used during +multi-thread operation. See +L. =head1 Conclusion diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat index d25473f01f..891720f661 100644 --- a/t/porting/known_pod_issues.dat +++ b/t/porting/known_pod_issues.dat @@ -288,6 +288,7 @@ printf(3) provide ptar(1) ptargrep(1) +pthreads(7) pwd_mkdb(8) querylocale(3) RDF::Trine @@ -343,6 +344,7 @@ String::Base String::Scanf String::Util strstr(3) +strtok(3) strtol(3) Switch tar(1) @@ -410,7 +412,7 @@ ext/pod-html/corpus/perlvar-copy.pod Verbatim line length including indents exce ext/vms-filespec/lib/vms/filespec.pm Verbatim line length including indents exceeds 78 by 1 install ? Should you be using F<...> or maybe L<...> instead of 1 install Verbatim line length including indents exceeds 78 by 2 -pod/perl.pod Verbatim line length including indents exceeds 78 by 6 +pod/perl.pod Verbatim line length including indents exceeds 78 by 5 pod/perlandroid.pod Verbatim line length including indents exceeds 78 by 3 pod/perldebguts.pod Verbatim line length including indents exceeds 78 by -1 pod/perldebtut.pod Verbatim line length including indents exceeds 78 by 2