mirror of
https://https.git.savannah.gnu.org/git/gettext.git
synced 2026-01-26 15:39:11 +00:00
726 lines
23 KiB
HTML
726 lines
23 KiB
HTML
<HTML>
|
|
<HEAD>
|
|
<!-- This HTML file has been created by texi2html 1.52a
|
|
from gettext.texi on 5 November 2002 -->
|
|
|
|
<TITLE>GNU gettext utilities - 8 Producing Binary MO Files</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
|
|
<P><HR><P>
|
|
|
|
|
|
<H1><A NAME="SEC119" HREF="gettext_toc.html#TOC119">8 Producing Binary MO Files</A></H1>
|
|
|
|
|
|
|
|
<H2><A NAME="SEC120" HREF="gettext_toc.html#TOC120">8.1 Invoking the <CODE>msgfmt</CODE> Program</A></H2>
|
|
|
|
<P>
|
|
<A NAME="IDX726"></A>
|
|
<A NAME="IDX727"></A>
|
|
|
|
<PRE>
|
|
msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
|
|
</PRE>
|
|
|
|
<P>
|
|
<A NAME="IDX728"></A>
|
|
The <CODE>msgfmt</CODE> programs generates a binary message catalog from a textual
|
|
translation description.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC121" HREF="gettext_toc.html#TOC121">8.1.1 Input file location</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`<VAR>filename</VAR>.po ...´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`-D <VAR>directory</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--directory=<VAR>directory</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX729"></A>
|
|
<A NAME="IDX730"></A>
|
|
Add <VAR>directory</VAR> to the list of directories. Source files are
|
|
searched relative to this list of directories. The resulting <TT>`.po´</TT>
|
|
file will be written relative to the current directory, though.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
If an input file is <SAMP>`-´</SAMP>, standard input is read.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC122" HREF="gettext_toc.html#TOC122">8.1.2 Operation mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-j´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--java´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX731"></A>
|
|
<A NAME="IDX732"></A>
|
|
<A NAME="IDX733"></A>
|
|
Java mode: generate a Java <CODE>ResourceBundle</CODE> class.
|
|
|
|
<DT><SAMP>`--java2´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX734"></A>
|
|
Like --java, and assume Java2 (JDK 1.2 or higher).
|
|
|
|
<DT><SAMP>`--tcl´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX735"></A>
|
|
<A NAME="IDX736"></A>
|
|
Tcl mode: generate a tcl/msgcat <TT>`.msg´</TT> file.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H3><A NAME="SEC123" HREF="gettext_toc.html#TOC123">8.1.3 Output file location</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-o <VAR>file</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--output-file=<VAR>file</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX737"></A>
|
|
<A NAME="IDX738"></A>
|
|
Write output to specified file.
|
|
|
|
<DT><SAMP>`--strict´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX739"></A>
|
|
Direct the program to work strictly following the Uniforum/Sun
|
|
implementation. Currently this only affects the naming of the output
|
|
file. If this option is not given the name of the output file is the
|
|
same as the domain name. If the strict Uniforum mode is enabled the
|
|
suffix <TT>`.mo´</TT> is added to the file name if it is not already
|
|
present.
|
|
|
|
We find this behaviour of Sun's implementation rather silly and so by
|
|
default this mode is <EM>not</EM> selected.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
If the output <VAR>file</VAR> is <SAMP>`-´</SAMP>, output is written to standard output.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC124" HREF="gettext_toc.html#TOC124">8.1.4 Output file location in Java mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-r <VAR>resource</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--resource=<VAR>resource</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX740"></A>
|
|
<A NAME="IDX741"></A>
|
|
Specify the resource name.
|
|
|
|
<DT><SAMP>`-l <VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--locale=<VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX742"></A>
|
|
<A NAME="IDX743"></A>
|
|
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
|
|
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
|
|
|
|
<DT><SAMP>`-d <VAR>directory</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX744"></A>
|
|
Specify the base directory of classes directory hierarchy.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
The class name is determined by appending the locale name to the resource name,
|
|
separated with an underscore. The <SAMP>`-d´</SAMP> option is mandatory. The class
|
|
is written under the specified directory.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC125" HREF="gettext_toc.html#TOC125">8.1.5 Output file location in Tcl mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-l <VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--locale=<VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX745"></A>
|
|
<A NAME="IDX746"></A>
|
|
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
|
|
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
|
|
|
|
<DT><SAMP>`-d <VAR>directory</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX747"></A>
|
|
Specify the base directory of <TT>`.msg´</TT> message catalogs.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
The <SAMP>`-l´</SAMP> and <SAMP>`-d´</SAMP> options are mandatory. The <TT>`.msg´</TT> file is
|
|
written in the specified directory.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC126" HREF="gettext_toc.html#TOC126">8.1.6 Input file interpretation</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-c´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--check´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX748"></A>
|
|
<A NAME="IDX749"></A>
|
|
Perform all the checks implied by <CODE>--check-format</CODE>, <CODE>--check-header</CODE>,
|
|
<CODE>--check-domain</CODE>.
|
|
|
|
<DT><SAMP>`--check-format´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX750"></A>
|
|
<A NAME="IDX751"></A>
|
|
Check language dependent format strings.
|
|
|
|
If the string represents a format string used in a
|
|
<CODE>printf</CODE>-like function both strings should have the same number of
|
|
<SAMP>`%´</SAMP> format specifiers, with matching types. If the flag
|
|
<CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special
|
|
comment <KBD>#,</KBD> for this entry a check is performed. For example, the
|
|
check will diagnose using <SAMP>`%.*s´</SAMP> against <SAMP>`%s´</SAMP>, or <SAMP>`%d´</SAMP>
|
|
against <SAMP>`%s´</SAMP>, or <SAMP>`%d´</SAMP> against <SAMP>`%x´</SAMP>. It can even handle
|
|
positional parameters.
|
|
|
|
Normally the <CODE>xgettext</CODE> program automatically decides whether a
|
|
string is a format string or not. This algorithm is not perfect,
|
|
though. It might regard a string as a format string though it is not
|
|
used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report
|
|
errors where there are none.
|
|
|
|
To solve this problem the programmer can dictate the decision to the
|
|
<CODE>xgettext</CODE> program (see section <A HREF="gettext_13.html#SEC204">13.3.1 C Format Strings</A>). The translator should not
|
|
consider removing the flag from the <KBD>#,</KBD> line. This "fix" would be
|
|
reversed again as soon as <CODE>msgmerge</CODE> is called the next time.
|
|
|
|
<DT><SAMP>`--check-header´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX752"></A>
|
|
Verify presence and contents of the header entry. See section <A HREF="gettext_5.html#SEC36">5.2 Filling in the Header Entry</A>,
|
|
for a description of the various fields in the header entry.
|
|
|
|
<DT><SAMP>`--check-domain´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX753"></A>
|
|
Check for conflicts between domain directives and the <CODE>--output-file</CODE>
|
|
option
|
|
|
|
<DT><SAMP>`-C´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--check-compatibility´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX754"></A>
|
|
<A NAME="IDX755"></A>
|
|
<A NAME="IDX756"></A>
|
|
Check that GNU msgfmt behaves like X/Open msgfmt. This will give an error
|
|
when attempting to use the GNU extensions.
|
|
|
|
<DT><SAMP>`--check-accelerators[=<VAR>char</VAR>]´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX757"></A>
|
|
<A NAME="IDX758"></A>
|
|
<A NAME="IDX759"></A>
|
|
<A NAME="IDX760"></A>
|
|
Check presence of keyboard accelerators for menu items. This is based on
|
|
the convention used in some GUIs that a keyboard accelerator in a menu
|
|
item string is designated by an immediately preceding <SAMP>`&´</SAMP> character.
|
|
Sometimes a keyboard accelerator is also called "keyboard mnemonic".
|
|
This check verifies that if the untranslated string has exactly one
|
|
<SAMP>`&´</SAMP> character, the translated string has exactly one <SAMP>`&´</SAMP> as well.
|
|
If this option is given with a <VAR>char</VAR> argument, this <VAR>char</VAR> should
|
|
be a non-alphanumeric character and is used as keyboard acceleator mark
|
|
instead of <SAMP>`&´</SAMP>.
|
|
|
|
<DT><SAMP>`-f´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--use-fuzzy´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX761"></A>
|
|
<A NAME="IDX762"></A>
|
|
<A NAME="IDX763"></A>
|
|
Use fuzzy entries in output. Note that using this option is usually wrong,
|
|
because fuzzy messages are exactly those which have not been validated by
|
|
a human translator.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H3><A NAME="SEC127" HREF="gettext_toc.html#TOC127">8.1.7 Output details</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-a <VAR>number</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--alignment=<VAR>number</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX764"></A>
|
|
<A NAME="IDX765"></A>
|
|
Align strings to <VAR>number</VAR> bytes (default: 1).
|
|
|
|
<DT><SAMP>`--no-hash´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX766"></A>
|
|
Don't include a hash table in the binary file. Lookup will be more expensive
|
|
at run time (binary search instead of hash table lookup).
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H3><A NAME="SEC128" HREF="gettext_toc.html#TOC128">8.1.8 Informative output</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-h´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--help´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX767"></A>
|
|
<A NAME="IDX768"></A>
|
|
Display this help and exit.
|
|
|
|
<DT><SAMP>`-V´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--version´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX769"></A>
|
|
<A NAME="IDX770"></A>
|
|
Output version information and exit.
|
|
|
|
<DT><SAMP>`--statistics´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX771"></A>
|
|
Print statistics about translations.
|
|
|
|
<DT><SAMP>`-v´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--verbose´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX772"></A>
|
|
<A NAME="IDX773"></A>
|
|
Increase verbosity level.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H2><A NAME="SEC129" HREF="gettext_toc.html#TOC129">8.2 Invoking the <CODE>msgunfmt</CODE> Program</A></H2>
|
|
|
|
<P>
|
|
<A NAME="IDX774"></A>
|
|
<A NAME="IDX775"></A>
|
|
|
|
<PRE>
|
|
msgunfmt [<VAR>option</VAR>] [<VAR>file</VAR>]...
|
|
</PRE>
|
|
|
|
<P>
|
|
<A NAME="IDX776"></A>
|
|
The <CODE>msgunfmt</CODE> program converts a binary message catalog to a
|
|
Uniforum style .po file.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC130" HREF="gettext_toc.html#TOC130">8.2.1 Operation mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-j´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--java´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX777"></A>
|
|
<A NAME="IDX778"></A>
|
|
<A NAME="IDX779"></A>
|
|
Java mode: input is a Java <CODE>ResourceBundle</CODE> class.
|
|
|
|
<DT><SAMP>`--tcl´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX780"></A>
|
|
<A NAME="IDX781"></A>
|
|
Tcl mode: input is a tcl/msgcat <TT>`.msg´</TT> file.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H3><A NAME="SEC131" HREF="gettext_toc.html#TOC131">8.2.2 Input file location</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`<VAR>file</VAR> ...´</SAMP>
|
|
<DD>
|
|
Input .mo files.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
If no input <VAR>file</VAR> is given or if it is <SAMP>`-´</SAMP>, standard input is read.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC132" HREF="gettext_toc.html#TOC132">8.2.3 Input file location in Java mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-r <VAR>resource</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--resource=<VAR>resource</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX782"></A>
|
|
<A NAME="IDX783"></A>
|
|
Specify the resource name.
|
|
|
|
<DT><SAMP>`-l <VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--locale=<VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX784"></A>
|
|
<A NAME="IDX785"></A>
|
|
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
|
|
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
The class name is determined by appending the locale name to the resource name,
|
|
separated with an underscore. The class is located using the <CODE>CLASSPATH</CODE>.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC133" HREF="gettext_toc.html#TOC133">8.2.4 Input file location in Tcl mode</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-l <VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--locale=<VAR>locale</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX786"></A>
|
|
<A NAME="IDX787"></A>
|
|
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
|
|
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
|
|
|
|
<DT><SAMP>`-d <VAR>directory</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX788"></A>
|
|
Specify the base directory of <TT>`.msg´</TT> message catalogs.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
The <SAMP>`-l´</SAMP> and <SAMP>`-d´</SAMP> options are mandatory. The <TT>`.msg´</TT> file is
|
|
located in the specified directory.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC134" HREF="gettext_toc.html#TOC134">8.2.5 Output file location</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-o <VAR>file</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--output-file=<VAR>file</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX789"></A>
|
|
<A NAME="IDX790"></A>
|
|
Write output to specified file.
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
The results are written to standard output if no output file is specified
|
|
or if it is <SAMP>`-´</SAMP>.
|
|
|
|
</P>
|
|
|
|
|
|
<H3><A NAME="SEC135" HREF="gettext_toc.html#TOC135">8.2.6 Output details</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`--force-po´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX791"></A>
|
|
Always write an output file even if it contains no message.
|
|
|
|
<DT><SAMP>`-i´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--indent´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX792"></A>
|
|
<A NAME="IDX793"></A>
|
|
Write the .po file using indented style.
|
|
|
|
<DT><SAMP>`--strict´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX794"></A>
|
|
Write out a strict Uniforum conforming PO file. Note that this
|
|
Uniforum format should be avoided because it doesn't support the
|
|
GNU extensions.
|
|
|
|
<DT><SAMP>`-w <VAR>number</VAR>´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--width=<VAR>number</VAR>´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX795"></A>
|
|
<A NAME="IDX796"></A>
|
|
Set the output page width. Long strings in the output files will be
|
|
split across multiple lines in order to ensure that each line's width
|
|
(= number of screen columns) is less or equal to the given <VAR>number</VAR>.
|
|
|
|
<DT><SAMP>`--no-wrap´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX797"></A>
|
|
Do not break long message lines. Message lines whose width exceeds the
|
|
output page width will not be split into several lines. Only file reference
|
|
lines which are wider than the output page width will be split.
|
|
|
|
<DT><SAMP>`-s´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--sort-output´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX798"></A>
|
|
<A NAME="IDX799"></A>
|
|
<A NAME="IDX800"></A>
|
|
Generate sorted output. Note that using this option makes it much harder
|
|
for the translator to understand each message's context.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H3><A NAME="SEC136" HREF="gettext_toc.html#TOC136">8.2.7 Informative output</A></H3>
|
|
|
|
<DL COMPACT>
|
|
|
|
<DT><SAMP>`-h´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--help´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX801"></A>
|
|
<A NAME="IDX802"></A>
|
|
Display this help and exit.
|
|
|
|
<DT><SAMP>`-V´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--version´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX803"></A>
|
|
<A NAME="IDX804"></A>
|
|
Output version information and exit.
|
|
|
|
<DT><SAMP>`-v´</SAMP>
|
|
<DD>
|
|
<DT><SAMP>`--verbose´</SAMP>
|
|
<DD>
|
|
<A NAME="IDX805"></A>
|
|
<A NAME="IDX806"></A>
|
|
Increase verbosity level.
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<H2><A NAME="SEC137" HREF="gettext_toc.html#TOC137">8.3 The Format of GNU MO Files</A></H2>
|
|
<P>
|
|
<A NAME="IDX807"></A>
|
|
<A NAME="IDX808"></A>
|
|
|
|
</P>
|
|
<P>
|
|
The format of the generated MO files is best described by a picture,
|
|
which appears below.
|
|
|
|
</P>
|
|
<P>
|
|
<A NAME="IDX809"></A>
|
|
The first two words serve the identification of the file. The magic
|
|
number will always signal GNU MO files. The number is stored in the
|
|
byte order of the generating machine, so the magic number really is
|
|
two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second
|
|
word describes the current revision of the file format. For now the
|
|
revision is 0. This might change in future versions, and ensures
|
|
that the readers of MO files can distinguish new formats from old
|
|
ones, so that both can be handled correctly. The version is kept
|
|
separate from the magic number, instead of using different magic
|
|
numbers for different formats, mainly because <TT>`/etc/magic´</TT> is
|
|
not updated often. It might be better to have magic separated from
|
|
internal format version identification.
|
|
|
|
</P>
|
|
<P>
|
|
Follow a number of pointers to later tables in the file, allowing
|
|
for the extension of the prefix part of MO files without having to
|
|
recompile programs reading them. This might become useful for later
|
|
inserting a few flag bits, indication about the charset used, new
|
|
tables, or other things.
|
|
|
|
</P>
|
|
<P>
|
|
Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
|
|
of string descriptors can be found. In both tables, each string
|
|
descriptor uses two 32 bits integers, one for the string length,
|
|
another for the offset of the string in the MO file, counting in bytes
|
|
from the start of the file. The first table contains descriptors
|
|
for the original strings, and is sorted so the original strings
|
|
are in increasing lexicographical order. The second table contains
|
|
descriptors for the translated strings, and is parallel to the first
|
|
table: to find the corresponding translation one has to access the
|
|
array slot in the second array with the same index.
|
|
|
|
</P>
|
|
<P>
|
|
Having the original strings sorted enables the use of simple binary
|
|
search, for when the MO file does not contain an hashing table, or
|
|
for when it is not practical to use the hashing table provided in
|
|
the MO file. This also has another advantage, as the empty string
|
|
in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
|
|
some system information attached to that particular MO file, and the
|
|
empty string necessarily becomes the first in both the original and
|
|
translated tables, making the system information very easy to find.
|
|
|
|
</P>
|
|
<P>
|
|
<A NAME="IDX810"></A>
|
|
The size <VAR>S</VAR> of the hash table can be zero. In this case, the
|
|
hash table itself is not contained in the MO file. Some people might
|
|
prefer this because a precomputed hashing table takes disk space, and
|
|
does not win <EM>that</EM> much speed. The hash table contains indices
|
|
to the sorted array of strings in the MO file. Conflict resolution is
|
|
done by double hashing. The precise hashing algorithm used is fairly
|
|
dependent on GNU <CODE>gettext</CODE> code, and is not documented here.
|
|
|
|
</P>
|
|
<P>
|
|
As for the strings themselves, they follow the hash file, and each
|
|
is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
|
|
the length which appears in the string descriptor. The <CODE>msgfmt</CODE>
|
|
program has an option selecting the alignment for MO file strings.
|
|
With this option, each string is separately aligned so it starts at
|
|
an offset which is a multiple of the alignment value. On some RISC
|
|
machines, a correct alignment will speed things up.
|
|
|
|
</P>
|
|
<P>
|
|
<A NAME="IDX811"></A>
|
|
Plural forms are stored by letting the plural of the original string
|
|
follow the singular of the original string, separated through a
|
|
<KBD>NUL</KBD> byte. The length which appears in the string descriptor
|
|
includes both. However, only the singular of the original string
|
|
takes part in the hash table lookup. The plural variants of the
|
|
translation are all stored consecutively, separated through a
|
|
<KBD>NUL</KBD> byte. Here also, the length in the string descriptor
|
|
includes all of them.
|
|
|
|
</P>
|
|
<P>
|
|
Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings.
|
|
However, the program interface currently used already presumes
|
|
that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
|
|
somewhat useless. But the MO file format is general enough so other
|
|
interfaces would be later possible, if for example, we ever want to
|
|
implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
|
|
accidently appear. (No, we don't want to have wide characters in MO
|
|
files. They would make the file unnecessarily large, and the
|
|
<SAMP>`wchar_t´</SAMP> type being platform dependent, MO files would be
|
|
platform dependent as well.)
|
|
|
|
</P>
|
|
<P>
|
|
This particular issue has been strongly debated in the GNU
|
|
<CODE>gettext</CODE> development forum, and it is expectable that MO file
|
|
format will evolve or change over time. It is even possible that many
|
|
formats may later be supported concurrently. But surely, we have to
|
|
start somewhere, and the MO file format described here is a good start.
|
|
Nothing is cast in concrete, and the format may later evolve fairly
|
|
easily, so we should feel comfortable with the current approach.
|
|
|
|
</P>
|
|
|
|
<PRE>
|
|
byte
|
|
+------------------------------------------+
|
|
0 | magic number = 0x950412de |
|
|
| |
|
|
4 | file format revision = 0 |
|
|
| |
|
|
8 | number of strings | == N
|
|
| |
|
|
12 | offset of table with original strings | == O
|
|
| |
|
|
16 | offset of table with translation strings | == T
|
|
| |
|
|
20 | size of hashing table | == S
|
|
| |
|
|
24 | offset of hashing table | == H
|
|
| |
|
|
. .
|
|
. (possibly more entries later) .
|
|
. .
|
|
| |
|
|
O | length & offset 0th string ----------------.
|
|
O + 8 | length & offset 1st string ------------------.
|
|
... ... | |
|
|
O + ((N-1)*8)| length & offset (N-1)th string | | |
|
|
| | | |
|
|
T | length & offset 0th translation ---------------.
|
|
T + 8 | length & offset 1st translation -----------------.
|
|
... ... | | | |
|
|
T + ((N-1)*8)| length & offset (N-1)th translation | | | | |
|
|
| | | | | |
|
|
H | start hash table | | | | |
|
|
... ... | | | |
|
|
H + S * 4 | end hash table | | | | |
|
|
| | | | | |
|
|
| NUL terminated 0th string <----------------' | | |
|
|
| | | | |
|
|
| NUL terminated 1st string <------------------' | |
|
|
| | | |
|
|
... ... | |
|
|
| | | |
|
|
| NUL terminated 0th translation <---------------' |
|
|
| | |
|
|
| NUL terminated 1st translation <-----------------'
|
|
| |
|
|
... ...
|
|
| |
|
|
+------------------------------------------+
|
|
</PRE>
|
|
|
|
<P><HR><P>
|
|
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
|
|
</BODY>
|
|
</HTML>
|