Similar to what gcc can do, make it possible for m4 to output Makefile
fragments that track the files that were included during processing,
in order to automatically rebuild files in the correct dependency
chains later on.
* NEWS: Document the feature.
* THANKS: Update.
* checks/get-them: Add support for declaring a test's auxfile.
* checks/check-them: Add code for handling auxilliary files, to make
testing the feature possible.
* doc/m4.texi (auxresult): New macro.
(Make dependency generation): New chapter.
* src/m4.h (makedep_gen_missing, REF_CMD_LINE, REF_INCLUDE)
(REF_SINCLUDE, REF_ALL, REF_NONE): Prepare for new options.
(record_dependency, generate_make_dependencies): New prototypes.
* src/m4.c (makedep_path, makedep_target, makedep_gen_missing)
(makedep_phony): Track new options.
(usage): Document new options.
(process_file): Track dependencies.
(main): Parse new options.
* src/builtin.c (include, m4_include, m4_sinclude): Track include
source.
* src/path.c (struct dependency): New struct.
(dependency_list, dependency_list_end): New variables.
(record_dependency, generate_make_dependencies): Output dependencies.
Co-developed-by: Lorenzo Di Gregorio <lorenzo.digregorio@gmail.com>
Even when @kbd{} is not going to be rendered, it is worth spelling it
without a bogus space.
* doc/m4.texi (Improved foreach, Improved copy): Fix spacing.
* checks/get-them: Tighten regex for kbd lines.
This reverts commit 16e712b9dbc. It turns out that having comments as
a higher priority than macros makes it possible to parse files that
use both REM or REMARK as a comment, as well as to parse files that
contain mismatched () as a single comment long enough to then translit
those into something safe for further m4 handling. POSIX may say it
is undefined, but since I actually encountered a case with the Advent
of Code challenges (adventofcode.com) where being able to abuse GNU m4
comment semantics let me solve a problem in m4, I see no reason to
break it after all.
* NEWS: Remove mention of reverted change.
* doc/m4.texi (Changecom): Update text to describe older rules, but
with more tests.
* src/input.c (next_token): Parse comments with highest priority.
Classic lambda calculus defines the Curry function to only apply one
additional argument; but it is just as easy in m4 to curry an
arbitrary number of fixed arguments coupled with an arbitrary number
of extra arguments. It is also worth documenting how to provide a
name to a curried function.
* examples/curry.m4: Accept multiple extra arguments, and improve
documentation.
* doc/m4-texi (Composition): Reflect it into the manual.
Reported-by: Nikolaos Chatzikonstantinou
https://lists.gnu.org/archive/html/m4-discuss/2025-05/msg00056.html
Borrow an idea from Java: even though eval always operates on signed
inputs, the ability to do an unsigned right shift is valuable when
operating on bitmasks.
* src/eval.c (eval_token): Add URSHIFT.
(eval_lex): Tokenize it.
(parse_expr): Evaluate it.
* doc/m4.texi (Eval): Document this.
* NEWS: Likewise.
Instead of O(n) in the value of the exponent, we can compute exponents
in O(log n), with exponentiation by squaring. With this patch, "time
echo 'eval(3**2000000000)' | m4" drops from 2 seconds to under 10
milliseconds.
* src/eval.c (parse_expr): Use exponentiation by squaring.
* doc/m4.texi (Eval): Test it.
The eval parser gives up at the first bad operator, but if that
operator occurs at a place where the parser was expecting a different
operator (')' or ':'), the error message was confusing, especially if
that other operator DOES appear later in the line. Thus, it's better
to reprioritize the errors to match.
* src/eval.c (primary): Favor bad op over missing ")".
(parse_expr): Favor bad op over missing ":".
* doc/m4.texi (Eval): Test it.
* THANKS: Update.
Reported-by: Nikolaos Chatzikonstantinou <nchatz314@gmail.com>
I'm hoping to add a debugmode(+r) to toggle whether to warn on
suspicious regex constructs. But I couldn't do it in m4 1.4.x, in
part because debugmode() warns on unrecognized flags. It will be
easier to add new flags in the future if it is also possible to query
the set of flags known by m4, as well as the current state of those
flags. And the easiest way to do that was to have debugmode(?) output
a string containing both the set and cleared flags, which means the
parser has to accept that as well. For back-compat with 1.4.x,
debugmode(-) is once again synonymous with clearing the default flags
but leaving others in place. And the earlier change to make 'm4 -d'
behave like 'm4 -d+adeq' but 'debugmode()' behave like
'debugmode(aeqt)' is too complex to maintain; it's simpler to have the
empty argument set the default rather than add the default.
* doc/m4.texi (Debugging options): Tweak example to match code.
(Debugmode): Mention qindir, missed in previous patch. Document new
ability to support + and - in same string, and new ? for query.
* src/builtin.c (m4_debugmode): Handle new "?" case.
* src/m4.h (debug_dump): New declaration.
* src/debug.c (debug_dump): New function.
(debug_set): New helper, factored out of...
(debug_decode): ...here. Restore 1.4.x compatibility with parsing
"", "-", and "+". Support more than one "-" and "+" in a string.
* NEWS: Document this.
Git likes to set mtime of files to the point where the working
directory is checked out, rather than the point where the last content
of the file was committed. But Automake likes to populate the
generated doc/version.texi based on the mtime of doc/m4.texi. Any
setup that uses a git checkout of m4 to build a new tarball will thus
get different contents in the manual if checked out on a different
date, breaking reproducible builds unless we take measures to
guarantee that the mtime matches the time of the last commit.
* configure.ac (st_touch): New code.
* THANKS: Update.
Suggested by Simon Josefsson, after a report by Santiago Vila:
https://lists.gnu.org/archive/html/bug-m4/2025-04/msg00052.html
(cherry picked from commit bdd97ee7cc2f7f645d75a29db92ddc85dea91f88)
Pick up several fixes that were identified from a recent scratch
release.
* gnulib: Bump to latest.
* gl-lib/bootstrap: Likewise.
* bootstrap: Regenerate.
(cherry picked from commit 129c078adb72b0fbcc492858a421958f6aef1b75)
Spotted while considering whether to add another '@comment xerr:
ignore' in the manual. The bug was introduced in commit f63f456a
(v1.4.11, 2008), but did not break any tests.
* checks/get-them: Reset correct variable on an ignored example.
(cherry picked from commit 4730bc8cd3c56976f1853a2cca10a09eb696531e)
Fix a regression introduced in e3c4d07c - when the left half of an
expression was syntactically valid but computationally undefined, the
parser was overwriting that status with a successful parse of the
right half, when the second operator has lower precedence than the
operator that caused the problem in the left half. The simplest test
case is "eval(1/0+1)"; also vulnerable was "eval(1/0||1/0)".
* src/eval.c (evaluate): Adjust signature, to avoid losing error
status of left half.
(primary, evaluate): Update callers.
* doc/m4.texi (Eval): Test it.
(cherry picked from commit d850009d869802e0b14f22a3de19559ce2fd23e2)
* src/Makefile.am (LDADD): Update library names according to Gnulib
NEWS 2023-01-07, and add missing entries.
* THANKS: Update.
Reported by Collin Funk in
https://lists.gnu.org/archive/html/bug-m4/2025-04/msg00030.html
(cherry picked from commit 6ebfc546b600625db70bf3c0a6a128997a3c04be)
As evidenced by recent gnulib traffic [1], GNU Emacs regexp syntax has
diverged over time, to the point that RE_SYNTAX_EMACS==0 is no longer
accurate: modern Emacs has since enabled a\{2\} interval repetition,
as well as [[:alpha:]] char classes, neither of which is supported in
current m4. However, for back-compat reasons, we cannot blindly
change m4 1.4.x away from syntax 0 even if it is no longer Emacs
syntax. Worse, at least Autoconf 2.72 has instances of regex where
both "{" and "\{" are intended to match a literal "{". Enabling
intervals could cause regex that compile now to fail to compile and
cause a warning, which is a change that can only be done on a major
version bump to 1.6. So for now, just document the limitations. [1]
https://lists.gnu.org/archive/html/bug-gnulib/2025-04/msg00064.html
* doc/m4.texi (regexp): Document highlights for users that don't want
to chase the link, and call out intentional lack of newer features
in contrast to what Emacs now supports.
(patsubst): Refer to regexp, rather than Emacs.
(cherry picked from commit c8a6346c2ba648b6300eba93c96baaad3857f986)
Gnulib documents that it now requires automake 1.14 or later. It also
mentions Autoconf 2.64 or later, but I found it easier to require 2.69
(released in 2012).
* configure.ac (AC_PREREQ, AM_INIT_AUTOMAKE): Require newer baselines.
* HACKING: Document this.
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 06000735730f82a8935abeedd0e1e5835686a53f)
* HACKING: Make a few updates to match the latest code base.
* cfg.mk (local-checks-to-skip): No longer exclude sc_bindtextdomain.
* THANKS: Add some recent (and not-so-recent) credit.
* NEWS: Capture a few more items of change.
(cherry picked from commit a22c9802dd7e724eaefb21dc21d84ac2d3a49c89)
A file describing unread mails from 30 years ago is not useful now;
furthermore, version control can still get at this if someone cares.
* BACKLOG: Delete.
* README: Drop mention of it.
(cherry picked from commit 4925b36a1ef7fde26a633454a7dea246aa4acd48)
No impact to a normal build, but this will help in profiling to decide
which bottlenecks are worth addressing.
* src/symtab.c (struct profile) [DEBUG_SYM]: Collect more statistics.
(profile_strcmp, lookup_symbol): Track more things.
(show_profile): Adjust output when probing stats.
(cherry picked from commit f2b516d4480eb8542c801f7e65bc2cc733a829a6)
* doc/m4.texi (Changequote): Document how to output mismatched
quotes, with a test added to the testsuite.
Suggested by Barry Davidson in
https://lists.gnu.org/archive/html/bug-m4/2023-08/msg00005.html
(cherry picked from commit f2fc6e3b0c5e3a527bbf66a56e7658479b6214c1)
While a recursive descent parser is easy to write, it involves a LOT
of function calls and boilerplate. Merely parsing "eval(1)" requires
descending through ALL 11 levels of operator precedence, only for each
layer to discover there is no operator. Better is the Pratt style of
LR(1) parsing [1], which can handle any grammar where no two
consecutive non-terminals or epsilon appear in the right side of any
rule [2]. Now, parsing is done with just two mutually recursive
functions; "eval(1)" works with just two function calls (primary()
determines the value, and parse_expr() determines no operators are
present), while more complicated expressions still produce the correct
results but with less recursion.
While at it, I noticed that "eval(1||(1/0))" used to produce a cryptic
message:
m4:stdin:1: bad expression in eval (excess input): 1||(1/0)
despite the similar "eval(1||1/0)" suppressing that as part of
short-circuiting. It turns out that my initial implementation of
short-circuiting in 1.4.8b (back in 2007!) was never fully tested on
more complex situations.
To test that the new implementation is indeed faster, I wrote an m4
solution [3] to an Advent of Code challenge [4] that required
computing 2000 iterations of a 24-bit linear feedback shift register
over 2000 input values (--trace shows nearly 20 million eval calls).
On my machine, runtime with an unoptimized pre-patch m4 was at 78
seconds, post-patch it completes in 66 seconds.
[1] https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
[2] https://en.wikipedia.org/wiki/Operator-precedence_parser
[3] https://repo.or.cz/aoc_eblake.git/blob/1b122791d4:/2024/day22.m4
[4] https://adventofcode.com/2024/day/22
* NEWS: Document the bug fix. Also document recent compilation fixes.
* cfg.mk (indent_args): Teach indent not to mangle int casts.
* doc/m4.text (Eval): Add coverage for the bug fix. Adjust one
error output that is now more precise.
* src/eval.c (logical_or_term, logical_and_term, or_term, xor_term)
(and_term, equality_term, cmp_term, shift_term, add_term, mult_term)
(exp_term, unary_term, simple_term): Delete, replaced by...
(primary, parse_expr): ...new functions.
(evaluate): Adjust caller.
(cherry picked from commit e3c4d07c56c0f85ef998067b59191c17c2188ab8)
Ever since 1.4.8b in 2007, we have been giving deprecation warnings
to anyone using the non-POSIX spelling of `=' for an eval equality
comparison (so that an eventual m4 2.0 can use it for assignment
instead). It's been long enough to pull the plug on this crutch.
* src/eval.c (enum eval_token): Reorder to be in precedence order,
with values assigned in groups of 10. Drop ASSIGN.
(equality_term): Drop support for ASSIGN.
* NEWS: Document this.
* doc/m4.texi (Eval): Likewise.
When the length is already known and likely to be short, avoiding the
function call to strlen can be a slight optimization. Two obvious
places: when ntoa() builds a number and already knows where the \0 is,
and when dumping arguments when the separator is always a single byte.
* src/builtin:c (dump_args): Change type of sep.
(m4_shift, m4_errprint, m4_m4wrap, expand_user_macro): Adjust all
callers.
(ntoa): Add optional end parameter.
(shipout_int, m4_eval, m4_maketemp): Adjust all callers.
* src/debug.c (trace_format): Likewise.
* src/output.c (shipout_text): Likewise.
* src/m4.h (ntoa): Adjust prototype.
(cherry picked from commit 51fa5f04f44d83a879c8275c3f9c71132390217d)
* gnulib: Update to latest, which now runs codespell during
'make syntax-check'.
* BACKLOG: Typo fix.
* HACKING: Likewise.
* Changelog-2014: Swap to UTF-8 spelling of past maintainer names.
* NEWS: Likewise.
* cfg.mk (old_NEWS_hash): Run 'make update-NEWS-hash'.
(exclude_file_name_regexp--sc_codespell, codespell_ignore_words_list):
New variables to silence codespell false positives.
* doc/m4.texi (Changeword): Swap example to use 'abc' instead of 'foo'
so that codespell doesn't complain about 'fo'.
(cherry picked from commit 2f445aef11cbfe8c7ea6e631fc1eb62e9f5c8e49)
Changeword is seldom-tested and bug-prone, and slows m4 down (distros
don't enable it). No one has asked on the mailing lists for it to
work better, or submitted patches. M4 2.0 might be able to offer
changesyntax as an alternative that is not as slow, and users are
still free to stick with 1.4.x if they don't want to upgrade to 1.6.
But since we've been warning about it being experimental for years
now, it's time to rip it out rather than trying to keep patching it.
* NEWS: Mention the removal.
* README: Drop mention of changeword.
* configure.ac: No longer support --enable-changeword.
* doc/m4.texi (Top, Operation modes, Quoting Arguments, Changequote)
(Changecom, Using frozen files): Drop mention of the feature.
(Changeword): Delete this section.
(Frozen file format): Mention how changeword from 1.4.x is handled.
* examples/null.m4: Drop use of changeword.
* examples/null.out: Regenerate.
* examples/null.err: Likewise.
* src/m4.h [ENABLE_CHANGEWORD]: Delete all code related to changeword.
* src/builtin.c: Likewise.
* src/input.c: Likewise.
* src/m4.c: Likewise.
Frozen files must not undergo newline munging. To support this, teach
m4_path_search whether a file is okay in text mode (normal m4 input)
or must be binary (a frozen file).
* src/m4.h (m4_path_search): Update prototype.
* src/path.c (m4_fopen, m4_path_search): Honor binary mode.
* src/builtin.c (m4_undivert, include): Update callers.
* src/freeze.c (reload_frozen_state): Likewise.
* src/m4.c (process_file): Likewise.
Reported by Juan Manuel Guerrero in
https://lists.gnu.org/archive/html/bug-m4/2023-01/msg00006.html
(cherry picked from commit 6005ef9b4156b7c7f580c30b10d4dfc2eb9b983f)
* src/path.c (include_env_init): When path_end becomes NULL, terminate
the loop without computing path_end + 1.
* src/macro.c (expand_macro): Pass a signed negative value to
obstack_blank_fast. This avoids a pointer overflow.
(cherry picked from commit 7ed117e2425b061567bb77ea8d39a5b53a7c14f6)
Regression from 2021-11-19. Fix proposed by Eric Blake.
* src/builtin.c (m4_syscmd): On Unix, prepend a space to the command before
executing it.
(m4_esyscmd): Likewise.
(cherry picked from commit 2479da0e7edcc6704968abe375853242ea40feb8)
Clang also understands '#pragma GCC diagnostic ignored
"-Wformat-nonliteral"', and refuses to build format.c with -Werror
without it.
(Note - even with this patch, a clang build still fails the gnulib
portion of 'make check' due to a link error in tests/test-gettimeofday;
that will be fixed in a later patch)
* src/m4.h: Prefer _GL_GNUC_PREREQ over bare __GNUC__ probes.
* src/format.c (expand_format): Likewise, and widen scope to
also appease clang.
Reported by David Arnstein in:
https://lists.gnu.org/archive/html/bug-m4/2024-12/msg00002.html
(cherry picked from commit 4ef9b6f89cfbaa11795cbd0cd2bae5cefbd3921b)
* cfg (indent_args): Inform indent of our no-TAB policy.
* src/*: Run 'make indent'.
* src/builtin.c (define_user_macro): Touch up odd split in _()
by reducing a layer of indentation.
* src/symtab.c (struct profile): Reformat comment to avoid
long line from indent.
* src/input.c (next_char_1): Likewise.
* src/m4.c (long_options): Likewise.
(includes): Prefer <error.h> over "error.h".
* src/m4.h: Likewise for <assert.h>.
(m4_error, m4_placeholder): Work around indent's inability to
grok ATTRIBUTE_COLD.
* Makefile.am (DISTCHECK_CONFIGURE_FLAGS): Rename to
AM_DISTCHECK_CONFIGURE_FLAGS.
(cherry picked from commit 2ce8dfa6d0e6c0779669e45cea247bf93ad2a2f8)
texi2any 7.0 added a new feature [1] to allow the elision of the space
between the macro name and its argument list in a @deffn. Since m4
must not have a space there, we want to use it.
[1] https://lists.gnu.org/archive/html/bug-texinfo/2022-07/msg00086.html
* doc/m4.texi: Elide space in rendering of macro definitions.
* bootstrap.conf (buildreq): Require new-enough makeinfo to support it.
(cherry picked from commit d8000cb41178f63c2cba8daa92443f1a6a2f785a)
* m4/gnulib-cache.m4: Adjust to match current tool output.
* src/m4.h: Don't include verror.hl
(m4_error, m4_error_at_line, m4_placeholder): Now ATTRIBUTE_COLD,
to pacify gcc with new Gnulib.
(cherry picked from commit b357b798b04053df437b2df2f4f42dca69fb764c)
Even though all libc handle it sanely (because size 0 says there is
nothing to copy), NULL is not a valid source pointer per a strict
reading of C, so UBSAN flags it:
+output.c:511:9: runtime error: null pointer passed as argument 2, which is declared to never be null
* src/output.c (make_room_for): Skip no-op memcpy.
Fixes: https://savannah.gnu.org/support/index.php?110809
Reported-by: Sam James
(cherry picked from commit a886ea40a29a08954ff80772e267828a1d440cc9)
A minor release is not the time for format(`%.1f', `4.0') to complain
about 4.0 not being a number followed by outputting "4,0" in locales
where the decimal point is a comma. Such a change belongs better in a
major release where more thought is put into locale-awareness across
the board.
* src/m4.c (main): Force LC_NUMERIC to c.
Reported-by: Bruno Haible in
https://lists.gnu.org/r/bug-m4/2021-06/msg00021.html
(cherry picked from commit dd434e3e958b3cebde954d66010919a40665812e)