mirror of
https://https.git.savannah.gnu.org/git/groff.git
synced 2026-01-26 07:37:53 +00:00
...and fix goof in test script. * src/roff/nroff/nroff.sh: Replace proxy `unset`-based test with an actual measurement of parameter expansion. * src/roff/nroff/tests/verbose_option_works.sh: Fix spurious failure of test for grep(1) supporting `-q` and `-x` options. Continues commit e05cf6d3b3, 4 December. Every time I have to deal with Solaris, it's like gazing into the Abyss. A test failure of "src/roff/nroff/tests/verbose_option_works.sh" is expected on Solaris 10, because 2 of the checks in that script verify that option clustering works, and with Solaris 10 /bin/sh, it doesn't. The test failure is therefore not spurious. One can edit the installed nroff script in place to use "/usr/xpg4/bin" in its shebang line.
339 lines
13 KiB
Plaintext
339 lines
13 KiB
Plaintext
Copyright 2022-2024 Free Software Foundation, Inc.
|
|
|
|
Copying and distribution of this file, with or without
|
|
modification, are permitted in any medium without royalty provided
|
|
the copyright notice and this notice are preserved.
|
|
|
|
This file contains advice on developing and contributing to groff. It
|
|
assumes that developers will install the 'git' revision control
|
|
system and build groff using the instructions in 'INSTALL.REPO'.
|
|
Familiarize yourself with the structure of the source tree by studying
|
|
its 'MANIFEST' file at the top level.
|
|
|
|
Implementation languages
|
|
------------------------
|
|
|
|
Beyond what is said under "Dependencies" in 'INSTALL.extra',
|
|
contributors should note that due to the age of the code base, much of
|
|
the C++ dialect employed by groff components, while standard, is older
|
|
than C++98--closer to Annotated Reference Manual C++ (Ellis, Stroustrup;
|
|
Addison-Wesley, 1990). groff implements its own string class and the
|
|
Standard Template Library is little used. A modest effort is underway
|
|
to update the code to more idiomatic C++98. Where a C++11 feature
|
|
promises to be advantageous, it may be annotated in a code comment.
|
|
|
|
Portability notes:
|
|
|
|
* `std::size` is not available in C++98. Use the `array_length`
|
|
template function in src/include/lib.h instead of `sizeof` and
|
|
dividing.
|
|
|
|
* C++98 lacks value initialization for array types.
|
|
|
|
https://cplusplus.github.io/CWG/issues/178.html
|
|
|
|
Use `memset()` after allocating an array from the stack or the heap
|
|
unless you are sure that every path through subsequent logic
|
|
determines the contents of every array element.
|
|
|
|
|
|
Automake
|
|
--------
|
|
|
|
A document explaining the basics of GNU Automake and its usage in groff
|
|
is available in 'doc/automake.mom'; peruse a PDF rendering in
|
|
'doc/automake.pdf' in your build tree.
|
|
|
|
|
|
Testing
|
|
-------
|
|
|
|
Running the test suite with 'make check' after building any substantive
|
|
change to groff logic is encouraged. You should certainly do so, and
|
|
confirm that the tests pass, before submitting patches to the groff
|
|
mailing list <groff@gnu.org> or Savannah issue tracker.
|
|
|
|
If you find a defect in a test script, that can be reported via Savannah
|
|
like any other bug.
|
|
|
|
|
|
Documenting changes
|
|
-------------------
|
|
|
|
The groff project has a long history and a large, varied audience.
|
|
Changes may need to be documented in up to three places depending on
|
|
their impact.
|
|
|
|
1. Changes should of course be documented in the Git commit message.
|
|
If a change alters only comments or formatting of source code, or
|
|
makes editorial changes to documentation, and does not resolve a
|
|
Savannah ticket, you can stop at that.
|
|
|
|
2. The 'ChangeLog' file follows the format and practices documented in
|
|
the GNU Coding Standards.
|
|
https://www.gnu.org/prep/standards/html_node/Change-Logs.html
|
|
|
|
The sub-projects in the 'contrib' directory each have their own
|
|
dedicated ChangeLog files. The file specifications documented there
|
|
are relative to the sub-project, not the root of the groff source
|
|
tree. When converted to a commit message, add 'contrib/$SUBPROJECT'
|
|
to the entries.
|
|
|
|
Apart from 'contrib', groff uses a single (current) 'ChangeLog' file
|
|
for the rest of its source tree.
|
|
|
|
It is convenient to write the ChangeLog entry or entries first, then
|
|
construct a commit message from it (or them).
|
|
|
|
3. The 'NEWS' file documents changes to groff that a user, not just a
|
|
developer, would notice, not including the resolution of defects.
|
|
|
|
As a hypothetical example, correcting a rendering error in tbl(1)
|
|
such that any table with more than 20 rows no longer had the text
|
|
"FOOBAR" spuriously added to some entries would not be a 'NEWS'
|
|
item, because the appearance of such text in the first place is a
|
|
surprising deviation from tbl's ideal and historical behavior. In
|
|
contrast, adding a command-line option to tbl, or changing the
|
|
meaning of its "expand" region option such that it no longer
|
|
horizontally compresses tables as well, _would_ be 'NEWS'-worthy.
|
|
|
|
|
|
Updating Copyright Notices
|
|
--------------------------
|
|
|
|
* The overall copyright notice for groff as a work of software is
|
|
updated at release time. See the 'FOR-RELEASE' file in the Git
|
|
repository.
|
|
|
|
* Update a _file_'s copyright notice in a year when committing a change
|
|
to it that is "original expression" and would thus merit copyright
|
|
protection. This is a subjective and arguable matter, so it's not
|
|
necessarily offensive to apply an expansive interpretation, but
|
|
"bumping" the copyright notice when _no_ change has been made, or when
|
|
the alterations are trivial by another standard (code style changes
|
|
that don't require regression testing; editorial changes to text that
|
|
are _invisible_ to the lay reader without technological
|
|
assistance--like trailing tab/space removal) abuses the principle.
|
|
|
|
* If you forget the foregoing step, or contributions to a file seem to
|
|
accrete original status over time or a series of commits, it's fine to
|
|
later update the notice to include the relevant (hopefully current)
|
|
year in a stand-alone commit. Use "git log --oneline" on a file to
|
|
gather commit IDs that justify the update and put them in the commit
|
|
message so that other people understand the basis of your claim.
|
|
|
|
* It's okay to simply report a range of years in the copyright notice
|
|
instead of a comma-separated list. As far as GBR knows there is no
|
|
hard rule that such ranges are interpreted exhaustively, and unless
|
|
someone has a chronological record of changes to the file, a broken
|
|
sequence of copyright coverage years makes little difference.
|
|
Copyright protection extends to those portions of the work fixed in a
|
|
tangible medium in the years declared in the copyright notice, except
|
|
for those portions whose copyright durations have elapsed. But these
|
|
are so lengthy that, in the United States as of 2024, no work of
|
|
computer software or documentation has ever yet even _partially_ aged
|
|
into the public domain. (Some has been placed in the public domain
|
|
deliberately, and some never enjoyed copyright protection at all.)
|
|
|
|
|
|
Writing Tests
|
|
-------------
|
|
|
|
Here are some portability notes on writing automated tests.
|
|
|
|
* Write to the POSIX standard for the shell and utilities where
|
|
possible. Issue 4 from 1994 is old enough that no contemporary system
|
|
has a good reason for not conforming. A copy of the standard is
|
|
available at the Open Group's web site.
|
|
https://pubs.opengroup.org/onlinepubs/009656399/toc.pdf
|
|
|
|
* The GNU coreutils "seq" command is handy but not standardized by
|
|
POSIX. Replace it with a while loop.
|
|
|
|
# emulate "seq 53"
|
|
n=1; while [ $n -le 53 ]; do echo $n; n=$(( n + 1 )); done; unset n
|
|
|
|
* The "wc" command on macOS can prefix the numeric count in its output
|
|
with spaces, which can be undesirable when storing that output to
|
|
variable that is later expanded within double quotes in the shell.
|
|
|
|
Here is a workaround.
|
|
|
|
res=$(whatever | wc -l)
|
|
res=$(( res + 0 )) || exit 99
|
|
|
|
If for some reason we get unacceptable non-integer garbage from "wc",
|
|
we exit the test script with the code reserved for "hard errors".
|
|
Shell arithmetic is unfortunately one of the many POSIX shell features
|
|
that Solaris 10's /bin/sh does not implement; see the "PROBLEMS" file.
|
|
|
|
* The "od" command on macOS can put extra space characters (i.e., spaces
|
|
that don't correspond to the input) at the ends of lines when using
|
|
the "-t c" format; GNU od does not.
|
|
|
|
So a regex like this that works with GNU od:
|
|
grep -Eqx '0000000 +A +\\b +B +\\b +C D +\\n'
|
|
might need to be weakened to the following to work on macOS.
|
|
grep -Eqx '0000000 +A +\\b +B +\\b +C D +\\n *'
|
|
|
|
* The "od" command on macOS puts extra space characters between the
|
|
hexadecimal values when using the "-t x1" format; GNU od does not.
|
|
|
|
So a regex like this that works with GNU od:
|
|
grep -q '81 30 55 81 30 56 81 6c e2'
|
|
might need to be weakened to the following to work on macOS.
|
|
grep -q '81 *30 *55 *81 *30 *56 *81 *6c *e2'
|
|
|
|
* The "od" command on macOS does not respect the environment variable
|
|
assignment "LC_ALL=C" when processing byte values 127<x<256 decimal
|
|
and using the "character" output format (option "-t c"). An
|
|
alternative output must be used, like bytewise octal (option "-t o1").
|
|
(macOS od may be non-conforming here, despite the claim of its man
|
|
page. POSIX Issue 4 od's description says "The type specifier
|
|
character c specifies that bytes will be interpreted as characters
|
|
specified by the current setting of the LC_CTYPE locale category. ...
|
|
Other non-printable characters will be written as one three-digit
|
|
octal number for each byte in the character." (p. 538) The language
|
|
in Issue 7 (2018) appears unchanged.
|
|
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/od.html )
|
|
|
|
* Prior to POSIX.1-2024, the meaning of the sequence `\]` in a basic or
|
|
extended regular expression is undefined. Spell it as `]` instead.
|
|
|
|
* macOS sed requires semicolons after commands even if they are followed
|
|
immediately by a closing brace.
|
|
|
|
Rewrite
|
|
sed -n '/Foo\./{n;s/^$/FAILURE/;p}'
|
|
as follows.
|
|
sed -n '/Foo\./{n;s/^$/FAILURE/;p;}'
|
|
|
|
But see below regarding the opening braces.
|
|
|
|
* POSIX doesn't say that sed has to accept semicolons as command
|
|
separators after label (':') and test ('t') commands, or after brace
|
|
commands, so macOS sed doesn't. GNU sed does.
|
|
|
|
So rewrite tidy, compact sed scripts like this:
|
|
sed -n '/Foo\./{n;s/^$/FAILURE/;tA;s/.*/SUCCESS/;:A;p}'
|
|
as this more cumbersome alternative.
|
|
sed -n \
|
|
-e '/Foo\./{n;s/^$/FAILURE/;tA;' \
|
|
-e 's/.*/SUCCESS/;:A;' \
|
|
-e 'p;}')
|
|
|
|
But see below regarding the opening braces.
|
|
|
|
Similarly, a brace sequence like that in this partial sed script:
|
|
/f1/p}}}}}}
|
|
must be rewritten as follows (or with '-e' expressions).
|
|
/f1/p;}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
* macOS and GNU sed don't require newlines (or '-e' expression endings)
|
|
after _opening_ braces, but Solaris 11 sed does.
|
|
|
|
So the sed script
|
|
/i/{N;/Table of Contents/{N;/Foo[. ][. ]*1/p;};}
|
|
must be rewritten as follows (or with '-e' expressions).
|
|
/i/{
|
|
N;/Table of Contents/{
|
|
N;/Foo[. ][. ]*1/p;
|
|
};
|
|
}
|
|
|
|
* Solaris 10's /usr/bin/cksum output is non-conforming with XPG4. It
|
|
uses tabs as field delimiters instead of spaces.
|
|
|
|
* Solaris 10's /usr/bin/grep is non-conforming with XPG4; it lacks
|
|
support for the `-E`, `-F`, `-q`, and `-x` options.
|
|
|
|
* Solaris 10's /bin/sh is non-conforming with XPG4; it does not support
|
|
POSIX parameter expansion syntax.
|
|
|
|
* Solaris 10's /usr/bin/tr exits with an error if you try to use a POSIX
|
|
character class (such as "[:cntrl:]") in any locale but "C".
|
|
|
|
* Solaris 10's /usr/xpg4/bin/sh is non-conforming with XPG4.
|
|
(Good job, guys!)
|
|
|
|
Its "unset" builtin is buggy. (The /usr/bin/sh in Solaris 11 does not
|
|
have this problem.)
|
|
|
|
We sometimes must use the "unset" shell builtin command to prevent
|
|
environment variables from confounding test results.
|
|
|
|
POSIX says "[u]nsetting a variable ... that was not previously set is
|
|
not considered an error and will not cause the shell to abort."
|
|
|
|
Nevertheless this builtin returns an error exit status in this
|
|
circumstance.
|
|
|
|
$ /usr/xpg4/bin/sh -c 'unset _NON_EXISTENT_XYZ; echo $?'
|
|
1
|
|
|
|
You may want to check for this misbehavior and skip the test if
|
|
running under an afflicted shell.
|
|
|
|
if ! unset VARIABLE_OF_INTEREST
|
|
then
|
|
echo "unable to clear environment; skipping" >&2
|
|
exit 77
|
|
fi
|
|
|
|
Updating gnulib
|
|
---------------
|
|
|
|
Here's how to update the submodule, using that project's "stable-202407"
|
|
branch as an example. From the root of your checkout:
|
|
|
|
$ cd gnulib
|
|
$ git checkout -b stable-202407 --track origin/stable-202407
|
|
$ git pull
|
|
$ cd ..
|
|
$ editor ChangeLog # log it
|
|
$ git add ChangeLog
|
|
$ git commit
|
|
|
|
It's likely a good idea to update the "bootstrap" script at the same
|
|
time (not necessarily in the same commit, however).
|
|
|
|
$ ./bootstrap --bootstrap-sync
|
|
$ editor ChangeLog # log it
|
|
$ git add ChangeLog
|
|
$ git commit
|
|
|
|
|
|
Theory of operation
|
|
-------------------
|
|
|
|
groff language parser
|
|
.....................
|
|
|
|
The "troff" program in "src/roff/troff" parses the groff input language.
|
|
There, "input.cpp" implements the main loop and tokenizes input. Input
|
|
tokens are transformed into nodes (a GNU troff internal data structure)
|
|
by "env.cpp" and "node.cpp". Routines in the latter file generate the
|
|
page description language from lists of nodes.
|
|
|
|
|
|
page description language parser
|
|
................................
|
|
|
|
The parser for the page description language produced by troff is
|
|
implemented in "src/libs/libdriver/input.cpp". This is used by all
|
|
groff output drivers written in C++. ("gropdf", written in Perl,
|
|
performs its own parsing.)
|
|
|
|
|
|
##### Editor settings
|
|
Local Variables:
|
|
fill-column: 72
|
|
mode: text
|
|
End:
|
|
vim: set autoindent textwidth=72:
|