doc: Expand section about preparing strings.

* gettext-tools/doc/gettext.texi (Triggering): Mention a few more Gnulib
modules.
(Preparing Strings): Turn subheadings into subsections.
(No string concatenation): Mention string concatenation operators and
strings with embedded expressions in various programming languages.
* NEWS: Mention it.
This commit is contained in:
Bruno Haible 2024-09-19 23:26:28 +02:00
parent 14aa472111
commit 4df2df7213
2 changed files with 252 additions and 45 deletions

4
NEWS
View File

@ -37,6 +37,10 @@ Version 0.23 - September 2024
now return the msgid untranslated. This is relevant for GNU systems,
Linux with musl libc, FreeBSD, NetBSD, OpenBSD, Cygwin, and Android.
* Documentation:
- The section "Preparing Strings" now gives more advice how to deal with
string concatenation and strings with embedded expressions.
* xgettext:
- Most of the diagnostics emitted by xgettext are now labelled as
"warning" or "error".

View File

@ -2021,9 +2021,14 @@ declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
If this is not
desirable in your application (for example in a compiler's parser),
you can use a set of substitute functions which hardwire the C locale,
such as found in the modules @samp{c-ctype}, @samp{c-strcase},
@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
source distribution.
such as found in the modules
@samp{c-ctype},
@samp{c-strcase},
@samp{c-strcasestr},
@samp{c-snprintf},
@samp{c-strtod}, @samp{c-strtold},
@samp{c-dtoastr}, @samp{c-ldtoastr}
in the GNU gnulib source distribution.
It is also possible to switch the locale forth and back between the
environment dependent locale and the C locale, but this approach is
@ -2069,7 +2074,18 @@ Avoid unusual markup and unusual control characters.
@noindent
Let's look at some examples of these guidelines.
@subheading Decent English style
@menu
* Decent English style::
* Entire sentences::
* Split at paragraphs::
* No string concatenation::
* No embedded URLs::
* No custom format directives::
* No unusual markup::
@end menu
@node Decent English style
@subsection Decent English style
@cindex style
Translatable strings should be in good English style. If slang language
@ -2098,7 +2114,8 @@ of the objects"?
In both cases, adding more words to the message will help both the
translator and the English speaking user.
@subheading Entire sentences
@node Entire sentences
@subsection Entire sentences
@cindex sentences
Translatable strings should be entire sentences. It is often not possible
@ -2167,9 +2184,10 @@ makes it easier for the translator to understand and translate both. On
the other hand, if one of the two messages is a stereotypic one, occurring
in other places as well, you will do a favour to the translator by not
merging the two. (Identical messages occurring in several places are
combined by xgettext, so the translator has to handle them once only.)
combined by @code{xgettext}, so the translator has to handle them once only.)
@subheading Split at paragraphs
@node Split at paragraphs
@subsection Split at paragraphs
@cindex paragraphs
Translatable strings should be limited to one paragraph; don't let a
@ -2189,7 +2207,8 @@ such as the input options, the output options, and the informative
output options. This will help every user to find the option he is
looking for.
@subheading No string concatenation
@node No string concatenation
@subsection No string concatenation
@cindex string concatenation
@cindex concatenation of strings
@ -2214,6 +2233,221 @@ to use a format string:
sprintf (s, "Replace %s with %s?", object1, object2);
@end example
@subheading String concatenation operator
In many programming languages,
a particular operator denotes string concatenation
at runtime (or possibly at compile time, if the compiler supports that).
@cindex Shell, string concatenation
@cindex Python, string concatenation
@cindex Smalltalk, string concatenation
@cindex Java, string concatenation
@cindex C#, string concatenation
@cindex awk, string concatenation
@cindex Perl, string concatenation
@cindex PHP, string concatenation
@cindex Ruby, string concatenation
@cindex Lua, string concatenation
@cindex JavaScript, string concatenation
@cindex Vala, string concatenation
@itemize @bullet
@item
In C++, string concatenation of @code{std::string} objects
is denoted by the @samp{+} operator.
@c Reference: https://en.cppreference.com/w/cpp/string/basic_string/operator%2B
@item
In Shell, string concatenation is denoted by mere juxtaposition of strings.
@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
@item
In Python, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://docs.python.org/3.12/reference/expressions.html#binary-arithmetic-operations
@item
In Smalltalk, string concatenation is denoted by the @samp{,} operator.
@c Reference: https://rmod-files.lille.inria.fr/FreeBooks/ByExample/14%20-%20Chapter%2012%20-%20Strings.pdf
@item
In Java, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.18.1
@item
In C#, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings
@item
In awk, string concatenation is denoted by mere juxtaposition of strings.
@c Reference: https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html
@item
In Perl, string concatenation is denoted by the @samp{.} operator.
@c Reference: https://perldoc.perl.org/perlop#Additive-Operators
@item
In PHP, string concatenation is denoted by the @samp{.} operator.
@c Reference: https://www.php.net/manual/en/language.operators.string.php
@item
In Ruby, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Operators
@c (Ignore ruby-doc.org! It is hopelessly outdated.)
@item
In Lua, string concatenation is denoted by the @samp{..} operator.
@c Reference: https://www.lua.org/pil/3.4.html
@item
In JavaScript, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Addition
@item
In Vala, string concatenation is denoted by the @samp{+} operator.
@c Reference: https://docs.vala.dev/tutorials/programming-language/main/02-00-basics/02-05-operators.html
@end itemize
So, for example, in Java, you would change
@example
System.out.println("Replace "+object1+" with "+object2+"?");
@end example
@noindent
into a statement involving a format string:
@example
System.out.println(
MessageFormat.format("Replace @{0@} with @{1@}?",
new Object[] @{ object1, object2 @}));
@end example
@noindent
Similarly, in C#, you would change
@example
Console.WriteLine("Replace "+object1+" with "+object2+"?");
@end example
@noindent
into a statement involving a format string:
@example
Console.WriteLine(
String.Format("Replace @{0@} with @{1@}?", object1, object2));
@end example
@subheading Strings with embedded expressions
In some programming languages,
it is possible to have strings with embedded expressions.
The expressions can refer to variables of the program.
The value of such an expression is converted to a string
and inserted in place of the expression;
but no formatting function is called.
@cindex Shell, strings with embedded expressions
@cindex Python, strings with embedded expressions
@cindex C#, strings with embedded expressions
@cindex Tcl, strings with embedded expressions
@cindex Perl, strings with embedded expressions
@cindex PHP, strings with embedded expressions
@cindex Ruby, strings with embedded expressions
@cindex JavaScript, strings with embedded expressions
@itemize @bullet
@item
In Shell language, double-quoted strings can contain
references to variables, along with default values and string operations.
Such as @code{"Hello, $name!"} or @code{"Hello, $@{name@}!"}.
@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_02_03
@item
In Python, @emph{f-strings} can contain expressions.
Such as @code{f"Hello, @{name@}!"}.
@c Reference: https://docs.python.org/3.12/reference/lexical_analysis.html#formatted-string-literals
@c @item
@c In Java, since Java 21, @emph{string templates} can contain expressions.
@c Such as @code{STR."Hello, \@{name\@}!"}.
@c Reference: https://openjdk.org/jeps/430 https://openjdk.org/jeps/459
@c Withdrawn: https://mail.openjdk.org/pipermail/amber-spec-experts/2024-April/004106.html
@item
In C#, since C# 6.0, @emph{interpolated strings} can contain expressions.
Such as @code{$"Hello, @{name@}!"}.
@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings
@item
In Tcl, strings are subject to @emph{variable substitution}.
Such as @code{"Hello, $name!"}.
@c Reference: https://wiki.tcl-lang.org/page/Dodekalogue
@c Reference: https://wiki.tcl-lang.org/page/Variable+Substitution
@item
In Perl, @emph{interpolated strings} can contain expressions.
Such as @code{"Hello, $name!"}.
@c Reference: https://perldoc.perl.org/perlintro#Basic-syntax-overview
@item
In PHP, string literals are subject to @emph{variable parsing}.
Such as @code{"Hello, $name!"}.
@c Reference: https://www.php.net/manual/en/language.variables.basics.php
@item
In Ruby, @emph{interpolated strings} can contain expressions.
Such as @code{"Hello, #@{name@}!"}.
@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#Interpolation
@c (Ignore ruby-doc.org! It is hopelessly outdated.)
@item
In JavaScript, since ES6, @emph{template literals} can contain expressions.
Such as @code{`Hello, $@{name@}!`}.
@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
@end itemize
These cases are effectively string concatenation as well,
just with a different syntax.
So, for example, in Python, you would change
@example
print (f'Replace @{object1.name@} with @{object2.name@}?')
@end example
@noindent
into a statement involving a format string:
@example
print ('Replace %(name1)s with %(name2)s?'
% @{ 'name1': object1.name, 'name2': object2.name @})
@end example
@noindent
or equivalently
@example
print ('Replace @{name1@} with @{name2@}?'
.format(name1 = object1.name, name2 = object2.name))
@end example
And in JavaScript, you would change
@example
print (`Replace $@{object1.name@} with $@{object2.name@}?`)
@end example
@noindent
into a statement involving a format string:
@example
print ('Replace %s with %s?'.format(object1.name, object2.name))
@end example
@subheading Format strings with embedded named references
Format strings with embedded named references are different:
They are suitable for internationalization, because it is possible
to insert a call to the @code{gettext} function (that will return a
translated format string) @emph{before} the argument values are
inserted in place of the placeholders.
The format string types that allow embedded named references are:
@itemize @bullet
@item
@ref{sh-format, Shell format strings}.
@item
In Python, those @ref{python-format, Python format strings}
that take a dictionary as argument,
and the @ref{python-format, Python brace format strings}.
@item
In Ruby, those @ref{ruby-format, Ruby format strings}
that take a hash table as argument.
@item
In Perl, the @ref{perl-format, Perl brace format strings}.
@end itemize
@subheading The @code{<inttypes.h>} macros
@cindex @code{inttypes.h}
A similar case is compile time concatenation of strings. The ISO C 99
include file @code{<inttypes.h>} contains a macro @code{PRId64} that
@ -2257,41 +2491,8 @@ of 100 is safe, because all available hardware integer types are limited to
128 bits, and to print a 128 bit integer one needs at most 54 characters,
regardless whether in decimal, octal or hexadecimal.
@cindex Java, string concatenation
@cindex C#, string concatenation
All this applies to other programming languages as well. For example, in
Java and C#, string concatenation is very frequently used, because it is a
compiler built-in operator. Like in C, in Java, you would change
@example
System.out.println("Replace "+object1+" with "+object2+"?");
@end example
@noindent
into a statement involving a format string:
@example
System.out.println(
MessageFormat.format("Replace @{0@} with @{1@}?",
new Object[] @{ object1, object2 @}));
@end example
@noindent
Similarly, in C#, you would change
@example
Console.WriteLine("Replace "+object1+" with "+object2+"?");
@end example
@noindent
into a statement involving a format string:
@example
Console.WriteLine(
String.Format("Replace @{0@} with @{1@}?", object1, object2));
@end example
@subheading No embedded URLs
@node No embedded URLs
@subsection No embedded URLs
It is good to not embed URLs in translatable strings, for several reasons:
@itemize @bullet
@ -2322,7 +2523,8 @@ fprintf (stream, _("GNU GPL version 3 <%s>\n"),
"https://gnu.org/licenses/gpl.html");
@end smallexample
@subheading No programmer-defined format string directives
@node No custom format directives
@subsection No programmer-defined format string directives
The GNU C Library's @code{<printf.h>} facility and the C++ standard library's @code{<format>} header file make it possible for the programmer to define their own format string directives. However, such format directives cannot be used in translatable strings, for two reasons:
@itemize @bullet
@ -2365,7 +2567,8 @@ string tmp = format ("@{:#$#@}", data);
cout << format (_("The contents is: @{@}"), tmp);
@end smallexample
@subheading No unusual markup
@node No unusual markup
@subsection No unusual markup
@cindex markup
@cindex control characters