mirror of
https://https.git.savannah.gnu.org/git/gettext.git
synced 2026-01-26 15:39:11 +00:00
doc: Expand section about preparing strings.
* gettext-tools/doc/gettext.texi (Triggering): Mention a few more Gnulib modules. (Preparing Strings): Turn subheadings into subsections. (No string concatenation): Mention string concatenation operators and strings with embedded expressions in various programming languages. * NEWS: Mention it.
This commit is contained in:
parent
14aa472111
commit
4df2df7213
4
NEWS
4
NEWS
@ -37,6 +37,10 @@ Version 0.23 - September 2024
|
||||
now return the msgid untranslated. This is relevant for GNU systems,
|
||||
Linux with musl libc, FreeBSD, NetBSD, OpenBSD, Cygwin, and Android.
|
||||
|
||||
* Documentation:
|
||||
- The section "Preparing Strings" now gives more advice how to deal with
|
||||
string concatenation and strings with embedded expressions.
|
||||
|
||||
* xgettext:
|
||||
- Most of the diagnostics emitted by xgettext are now labelled as
|
||||
"warning" or "error".
|
||||
|
||||
@ -2021,9 +2021,14 @@ declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
|
||||
If this is not
|
||||
desirable in your application (for example in a compiler's parser),
|
||||
you can use a set of substitute functions which hardwire the C locale,
|
||||
such as found in the modules @samp{c-ctype}, @samp{c-strcase},
|
||||
@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
|
||||
source distribution.
|
||||
such as found in the modules
|
||||
@samp{c-ctype},
|
||||
@samp{c-strcase},
|
||||
@samp{c-strcasestr},
|
||||
@samp{c-snprintf},
|
||||
@samp{c-strtod}, @samp{c-strtold},
|
||||
@samp{c-dtoastr}, @samp{c-ldtoastr}
|
||||
in the GNU gnulib source distribution.
|
||||
|
||||
It is also possible to switch the locale forth and back between the
|
||||
environment dependent locale and the C locale, but this approach is
|
||||
@ -2069,7 +2074,18 @@ Avoid unusual markup and unusual control characters.
|
||||
@noindent
|
||||
Let's look at some examples of these guidelines.
|
||||
|
||||
@subheading Decent English style
|
||||
@menu
|
||||
* Decent English style::
|
||||
* Entire sentences::
|
||||
* Split at paragraphs::
|
||||
* No string concatenation::
|
||||
* No embedded URLs::
|
||||
* No custom format directives::
|
||||
* No unusual markup::
|
||||
@end menu
|
||||
|
||||
@node Decent English style
|
||||
@subsection Decent English style
|
||||
|
||||
@cindex style
|
||||
Translatable strings should be in good English style. If slang language
|
||||
@ -2098,7 +2114,8 @@ of the objects"?
|
||||
In both cases, adding more words to the message will help both the
|
||||
translator and the English speaking user.
|
||||
|
||||
@subheading Entire sentences
|
||||
@node Entire sentences
|
||||
@subsection Entire sentences
|
||||
|
||||
@cindex sentences
|
||||
Translatable strings should be entire sentences. It is often not possible
|
||||
@ -2167,9 +2184,10 @@ makes it easier for the translator to understand and translate both. On
|
||||
the other hand, if one of the two messages is a stereotypic one, occurring
|
||||
in other places as well, you will do a favour to the translator by not
|
||||
merging the two. (Identical messages occurring in several places are
|
||||
combined by xgettext, so the translator has to handle them once only.)
|
||||
combined by @code{xgettext}, so the translator has to handle them once only.)
|
||||
|
||||
@subheading Split at paragraphs
|
||||
@node Split at paragraphs
|
||||
@subsection Split at paragraphs
|
||||
|
||||
@cindex paragraphs
|
||||
Translatable strings should be limited to one paragraph; don't let a
|
||||
@ -2189,7 +2207,8 @@ such as the input options, the output options, and the informative
|
||||
output options. This will help every user to find the option he is
|
||||
looking for.
|
||||
|
||||
@subheading No string concatenation
|
||||
@node No string concatenation
|
||||
@subsection No string concatenation
|
||||
|
||||
@cindex string concatenation
|
||||
@cindex concatenation of strings
|
||||
@ -2214,6 +2233,221 @@ to use a format string:
|
||||
sprintf (s, "Replace %s with %s?", object1, object2);
|
||||
@end example
|
||||
|
||||
@subheading String concatenation operator
|
||||
|
||||
In many programming languages,
|
||||
a particular operator denotes string concatenation
|
||||
at runtime (or possibly at compile time, if the compiler supports that).
|
||||
|
||||
@cindex Shell, string concatenation
|
||||
@cindex Python, string concatenation
|
||||
@cindex Smalltalk, string concatenation
|
||||
@cindex Java, string concatenation
|
||||
@cindex C#, string concatenation
|
||||
@cindex awk, string concatenation
|
||||
@cindex Perl, string concatenation
|
||||
@cindex PHP, string concatenation
|
||||
@cindex Ruby, string concatenation
|
||||
@cindex Lua, string concatenation
|
||||
@cindex JavaScript, string concatenation
|
||||
@cindex Vala, string concatenation
|
||||
@itemize @bullet
|
||||
@item
|
||||
In C++, string concatenation of @code{std::string} objects
|
||||
is denoted by the @samp{+} operator.
|
||||
@c Reference: https://en.cppreference.com/w/cpp/string/basic_string/operator%2B
|
||||
@item
|
||||
In Shell, string concatenation is denoted by mere juxtaposition of strings.
|
||||
@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
|
||||
@item
|
||||
In Python, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://docs.python.org/3.12/reference/expressions.html#binary-arithmetic-operations
|
||||
@item
|
||||
In Smalltalk, string concatenation is denoted by the @samp{,} operator.
|
||||
@c Reference: https://rmod-files.lille.inria.fr/FreeBooks/ByExample/14%20-%20Chapter%2012%20-%20Strings.pdf
|
||||
@item
|
||||
In Java, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.18.1
|
||||
@item
|
||||
In C#, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings
|
||||
@item
|
||||
In awk, string concatenation is denoted by mere juxtaposition of strings.
|
||||
@c Reference: https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html
|
||||
@item
|
||||
In Perl, string concatenation is denoted by the @samp{.} operator.
|
||||
@c Reference: https://perldoc.perl.org/perlop#Additive-Operators
|
||||
@item
|
||||
In PHP, string concatenation is denoted by the @samp{.} operator.
|
||||
@c Reference: https://www.php.net/manual/en/language.operators.string.php
|
||||
@item
|
||||
In Ruby, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Operators
|
||||
@c (Ignore ruby-doc.org! It is hopelessly outdated.)
|
||||
@item
|
||||
In Lua, string concatenation is denoted by the @samp{..} operator.
|
||||
@c Reference: https://www.lua.org/pil/3.4.html
|
||||
@item
|
||||
In JavaScript, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Addition
|
||||
@item
|
||||
In Vala, string concatenation is denoted by the @samp{+} operator.
|
||||
@c Reference: https://docs.vala.dev/tutorials/programming-language/main/02-00-basics/02-05-operators.html
|
||||
@end itemize
|
||||
|
||||
So, for example, in Java, you would change
|
||||
|
||||
@example
|
||||
System.out.println("Replace "+object1+" with "+object2+"?");
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
System.out.println(
|
||||
MessageFormat.format("Replace @{0@} with @{1@}?",
|
||||
new Object[] @{ object1, object2 @}));
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
Similarly, in C#, you would change
|
||||
|
||||
@example
|
||||
Console.WriteLine("Replace "+object1+" with "+object2+"?");
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
Console.WriteLine(
|
||||
String.Format("Replace @{0@} with @{1@}?", object1, object2));
|
||||
@end example
|
||||
|
||||
@subheading Strings with embedded expressions
|
||||
|
||||
In some programming languages,
|
||||
it is possible to have strings with embedded expressions.
|
||||
The expressions can refer to variables of the program.
|
||||
The value of such an expression is converted to a string
|
||||
and inserted in place of the expression;
|
||||
but no formatting function is called.
|
||||
|
||||
@cindex Shell, strings with embedded expressions
|
||||
@cindex Python, strings with embedded expressions
|
||||
@cindex C#, strings with embedded expressions
|
||||
@cindex Tcl, strings with embedded expressions
|
||||
@cindex Perl, strings with embedded expressions
|
||||
@cindex PHP, strings with embedded expressions
|
||||
@cindex Ruby, strings with embedded expressions
|
||||
@cindex JavaScript, strings with embedded expressions
|
||||
@itemize @bullet
|
||||
@item
|
||||
In Shell language, double-quoted strings can contain
|
||||
references to variables, along with default values and string operations.
|
||||
Such as @code{"Hello, $name!"} or @code{"Hello, $@{name@}!"}.
|
||||
@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_02_03
|
||||
@item
|
||||
In Python, @emph{f-strings} can contain expressions.
|
||||
Such as @code{f"Hello, @{name@}!"}.
|
||||
@c Reference: https://docs.python.org/3.12/reference/lexical_analysis.html#formatted-string-literals
|
||||
@c @item
|
||||
@c In Java, since Java 21, @emph{string templates} can contain expressions.
|
||||
@c Such as @code{STR."Hello, \@{name\@}!"}.
|
||||
@c Reference: https://openjdk.org/jeps/430 https://openjdk.org/jeps/459
|
||||
@c Withdrawn: https://mail.openjdk.org/pipermail/amber-spec-experts/2024-April/004106.html
|
||||
@item
|
||||
In C#, since C# 6.0, @emph{interpolated strings} can contain expressions.
|
||||
Such as @code{$"Hello, @{name@}!"}.
|
||||
@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings
|
||||
@item
|
||||
In Tcl, strings are subject to @emph{variable substitution}.
|
||||
Such as @code{"Hello, $name!"}.
|
||||
@c Reference: https://wiki.tcl-lang.org/page/Dodekalogue
|
||||
@c Reference: https://wiki.tcl-lang.org/page/Variable+Substitution
|
||||
@item
|
||||
In Perl, @emph{interpolated strings} can contain expressions.
|
||||
Such as @code{"Hello, $name!"}.
|
||||
@c Reference: https://perldoc.perl.org/perlintro#Basic-syntax-overview
|
||||
@item
|
||||
In PHP, string literals are subject to @emph{variable parsing}.
|
||||
Such as @code{"Hello, $name!"}.
|
||||
@c Reference: https://www.php.net/manual/en/language.variables.basics.php
|
||||
@item
|
||||
In Ruby, @emph{interpolated strings} can contain expressions.
|
||||
Such as @code{"Hello, #@{name@}!"}.
|
||||
@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#Interpolation
|
||||
@c (Ignore ruby-doc.org! It is hopelessly outdated.)
|
||||
@item
|
||||
In JavaScript, since ES6, @emph{template literals} can contain expressions.
|
||||
Such as @code{`Hello, $@{name@}!`}.
|
||||
@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
|
||||
@end itemize
|
||||
|
||||
These cases are effectively string concatenation as well,
|
||||
just with a different syntax.
|
||||
|
||||
So, for example, in Python, you would change
|
||||
|
||||
@example
|
||||
print (f'Replace @{object1.name@} with @{object2.name@}?')
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
print ('Replace %(name1)s with %(name2)s?'
|
||||
% @{ 'name1': object1.name, 'name2': object2.name @})
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
or equivalently
|
||||
@example
|
||||
print ('Replace @{name1@} with @{name2@}?'
|
||||
.format(name1 = object1.name, name2 = object2.name))
|
||||
@end example
|
||||
|
||||
And in JavaScript, you would change
|
||||
|
||||
@example
|
||||
print (`Replace $@{object1.name@} with $@{object2.name@}?`)
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
print ('Replace %s with %s?'.format(object1.name, object2.name))
|
||||
@end example
|
||||
|
||||
@subheading Format strings with embedded named references
|
||||
|
||||
Format strings with embedded named references are different:
|
||||
They are suitable for internationalization, because it is possible
|
||||
to insert a call to the @code{gettext} function (that will return a
|
||||
translated format string) @emph{before} the argument values are
|
||||
inserted in place of the placeholders.
|
||||
|
||||
The format string types that allow embedded named references are:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
@ref{sh-format, Shell format strings}.
|
||||
@item
|
||||
In Python, those @ref{python-format, Python format strings}
|
||||
that take a dictionary as argument,
|
||||
and the @ref{python-format, Python brace format strings}.
|
||||
@item
|
||||
In Ruby, those @ref{ruby-format, Ruby format strings}
|
||||
that take a hash table as argument.
|
||||
@item
|
||||
In Perl, the @ref{perl-format, Perl brace format strings}.
|
||||
@end itemize
|
||||
|
||||
@subheading The @code{<inttypes.h>} macros
|
||||
|
||||
@cindex @code{inttypes.h}
|
||||
A similar case is compile time concatenation of strings. The ISO C 99
|
||||
include file @code{<inttypes.h>} contains a macro @code{PRId64} that
|
||||
@ -2257,41 +2491,8 @@ of 100 is safe, because all available hardware integer types are limited to
|
||||
128 bits, and to print a 128 bit integer one needs at most 54 characters,
|
||||
regardless whether in decimal, octal or hexadecimal.
|
||||
|
||||
@cindex Java, string concatenation
|
||||
@cindex C#, string concatenation
|
||||
All this applies to other programming languages as well. For example, in
|
||||
Java and C#, string concatenation is very frequently used, because it is a
|
||||
compiler built-in operator. Like in C, in Java, you would change
|
||||
|
||||
@example
|
||||
System.out.println("Replace "+object1+" with "+object2+"?");
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
System.out.println(
|
||||
MessageFormat.format("Replace @{0@} with @{1@}?",
|
||||
new Object[] @{ object1, object2 @}));
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
Similarly, in C#, you would change
|
||||
|
||||
@example
|
||||
Console.WriteLine("Replace "+object1+" with "+object2+"?");
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
into a statement involving a format string:
|
||||
|
||||
@example
|
||||
Console.WriteLine(
|
||||
String.Format("Replace @{0@} with @{1@}?", object1, object2));
|
||||
@end example
|
||||
|
||||
@subheading No embedded URLs
|
||||
@node No embedded URLs
|
||||
@subsection No embedded URLs
|
||||
|
||||
It is good to not embed URLs in translatable strings, for several reasons:
|
||||
@itemize @bullet
|
||||
@ -2322,7 +2523,8 @@ fprintf (stream, _("GNU GPL version 3 <%s>\n"),
|
||||
"https://gnu.org/licenses/gpl.html");
|
||||
@end smallexample
|
||||
|
||||
@subheading No programmer-defined format string directives
|
||||
@node No custom format directives
|
||||
@subsection No programmer-defined format string directives
|
||||
|
||||
The GNU C Library's @code{<printf.h>} facility and the C++ standard library's @code{<format>} header file make it possible for the programmer to define their own format string directives. However, such format directives cannot be used in translatable strings, for two reasons:
|
||||
@itemize @bullet
|
||||
@ -2365,7 +2567,8 @@ string tmp = format ("@{:#$#@}", data);
|
||||
cout << format (_("The contents is: @{@}"), tmp);
|
||||
@end smallexample
|
||||
|
||||
@subheading No unusual markup
|
||||
@node No unusual markup
|
||||
@subsection No unusual markup
|
||||
|
||||
@cindex markup
|
||||
@cindex control characters
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user