Go back to a single mcel module, instead of trying to break it up
into ucore and mcel pieces, as breaking it up hurt performance.
Use gnulib-tool’s --local-dir to create diffutils-specific modules
for mcel; the idea is that this will eventually migrate into Gnulib.
* bootstrap.conf (avoided_gnulib_modules): Add mbuiterf.
(gnulib_modules): Add mbscasecmp, mcel-prefer.
(gnulib_tool_option_extras): Add --local-dir=gl to pick up new files.
* cfg.mk (exclude_file_name_regexp--sc_prohibit_doubled_word):
Do not exclude now-removed files lib/ucore.c, lib/ucore.h.
* lib/Makefile.am: Adjust to use of modules.
(noinst_HEADERS): Remove mcel.h, ucore.h.
(libdiffutils_a_SOURCES): Remove mcel.c, mcel-casecmp.c, ucore.c
* lib/mcel-casecmp.c, lib/ucore.c, lib/ucore.h: Remove.
* lib/mcel.h: Switch to LGPLv2.1+. Do not include ucore.h.
All uses of ucore_t changed back to using char32_t.
Do what ucore.h used to do: include verify.h, limits.h, stddef.h,
uchar.h; require config.h, define _GL_LIKELY, _GL_UNLIKELY.
(MCEl_CHAR_MAX, MCEL_ERR_MIN, MCEL_ERR_MAX): New constants.
(mcel_t): Switch from single ucore_t c to a char32_t ch and
unsigned char err. This has significantly better performance on
Fedora 38 x86-64. All uses changed. Check that unsigned char
promotes to int.
(mcel_ch, mcel_err, mcel_cmp, mcel_tocmp): New functions.
(MCEL_ERR_SHIFT): Rename from MCEL_ENCODING_ERROR_SHIFT.
All uses changed.
(mcel_isbasic): Add a _GL_LIKELY to help compilers. All uses changed.
(mcel_scan, mcel_scant): Simplify by using mcel_ch, mcel_err.
(mcel_casecmp): Remove decl. Callers changed to use mbscasecmp.
* gl/lib/mcel.c, gl/lib/mcel.h: Rename from lib/mcel.c, lib/mcel.h.
* gl/lib/mbscasecmp.c: New file.
* gl/modules/mcel, gl/modules/mcel-prefer, gl/modules/mcel-tests:
* gl/tests/test-mcel.c:
New files.
* src/io.c: Revert use of ucore API. Use plain c32isspace etc.
instead of ucore_is. Use .err instead of ucore_iserr.
(same_ch_err): Bring back, and use it instead of ucore_cmp.
* src/side.c (print_half_line): Use .err instead of ucore_iserr.
* lib/Makefile.am: Adjust to file renamings and additions.
* lib/mbcel.c, lib/mbcel.h: Split into two APIs, replacing with ...
* lib/mcel.c, lib/mcel.h, lib/ucore.c, lib/ucore.h: ... these new files.
* lib/mcel.h: Simplify by assuming ucore.h is included.
Check that bytes have 8 bits.
(MCEL_LEN_MAX, mcel_t, MCEL_INLINE, MCEL_ENCODING_ERROR_SHIFT)
(mcel_scan, mcel_scant, mcel_scanz, mcel_casecmp):
Rename from MBCEL_LEN_MAX, mbcel_t, MBCEL_INLINE,
MBCEL_ENCODING_ERROR_SHIFT, mbcel_scan, mbcel_scanz, mbcel_scant,
mbcel_casecmp.
(mcel_t): New member c, replacing old members ch and err.
All uses changed.
(MBCEL_UCHAR_FITS, MBCEL_UCHAR_EASILY_FITS): Remove.
All uses removed. No longer needed now 8-bit bytes are assumed.
(MCEL_ENCODING_ERROR_SHIFT): Check that it matches UCORE_ERR_MIN.
(mcel_isbasic): New function. Use it where appropriate.
(mbcel_cmp, mbcel_casecmp): Remove; replaced by ucore_cmp,
ucore_tocmp. All uses changed.
* lib/mcel-casecmp.c: Rename from lib/mbcel-strcasecmp.c.
Include mcel.h instead of mbcel.h.
(mcel_casecmp): Rename from mbcel_strcasecmp. All uses changed.
Assert that UCHAR_MAX <= INT_MAX, as POSIX requires,
and simplify code accordingly. Use mcel rather than mbcel.
* lib/ucore.h: Include verify.h.
(ucore_t): New type.
(UCORE_CHAR_MAX, UCORE_ERR_MIN, UCORE_ERR_MAX, UCORE_C32_SAFE):
New constants. Check that information is not lost by encoding
errors as integers; this is a weaker test than CHAR_BIT == 8.
(ucore_iserr, ucore_is, ucore_to): New functions.
(ucore_cmp, ucore_tocmp): New functions, replacing the old
mbcel_cmp, mbcel_casecmp. All uses changed.
* src/dir.c, src/io.c, src/side.c: Use mcel rather than mbcel.
* src/io.c (same_ch_err): Remove. All uses replaced by ucore_cmp.
* bootstrap.conf (gnulib_modules): Add builtin-expect.
* lib/mbcel-strcasecmp.c: New file.
* lib/Makefile.am (libdiffutils_a_SOURCES): Add it.
* lib/mbcel.h (MBCEL_LEN_MAX, MBCEL_ENCODING_ERROR_SHIFT)
(MBCEL_UCHAR_FITS, MBCEL_UCHAR_EASILY_FITS): New constants.
(_GL_LIKELY): New macro.
(mbcel_scan): Use it. Simplify NetBSD code.
(mbcel_scant, mbcel_scanz, mbcel_cmp, mbcel_casecmp): New functions.
* src/dir.c (strcasecoll): Move defn here from system.h,
since only dir.c needs it. Use mbcel_strcasecmp instead
of strcasecmp.
* lib/mbcel.h (mbcel_t):
Fix bug on NetBSD as I read its code incorrectly earlier.
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2023-07/msg00085.html
Mostly for documentation, use _GL_ATTRIBUTE_MAY_ALIAS to remind
compiler not to rely on strict C semantics for unions.
* lib/mbcel.h: Include limits.h, stddef.h.
Add static assertions that MB_LEN_MAX has a sane value,
as the code relies on this. Help GCC by advising
it that mbrtoc32 never returns a value between
MB_LEN_MAX + 1 and (size_t) -1 / 2 inclusive.
* bootstrap.conf (gnulib_modules): Add c32isspace, c32tolower.
* lib/Makefile.am (noinst_HEADERS): Add mbcel.h.
(libdiffutils_a_SOURCES): Add mbcel.c
* lib/mbcel.c, lib/mbcel.h: New files.
* src/io.c: Include mbcel.h, uchar.h.
(hash): 2nd arg is now hash_value, not merely unsigned char,
since the caller might pass a char32_t now.
(find_and_hash_each_line): Support multi-byte input.
* src/util.c: Include mbcel.h, uchar.h.
(lines_differ): New args S1LEN, S2LEN, needed for mbcel_scan.
Caller changed. Support multi-byte input.
* tests/ignore-case: New file.
* tests/Makefile.am (TESTS): Add it.
* tests/ignore-tab-expansion: Add UTF-8 test.
* tests/init.cfg (require_utf8_locale_): New function.
* tests/side-by-side: Use it. Add a column-counting test.
Prefer idx_t to size_t in lib/cmpbuf.c and related buffer-size code.
Because POSIX says blksize_t can be wider than idx_t,
check for overflow when copying the former to the latter.
* bootstrap.conf (gnulib_modules): Add idx.
* lib/cmpbuf.c (block_read, buffer_lcm):
Prefer idx_t to size_t. All uses changed.
* lib/cmpbuf.c (block_read): Return ptrdiff_t instead of size_t.
All uses changed.
(buffer_lcm): Help the compiler by checking for negative args,
even though they are not allowed.
* lib/cmpbuf.h: Include idx.h and stddef.h, for idx_t and ptrdiff_t,
so that this include file is self-contained.
* src/analyze.c (diff_2_files):
* src/cmp.c (main):
* src/diff.c, src/io.c: Do not include stdckdint.h here,
since system.h now does that.
* src/diff3.c (read_diff):
* src/io.c (sip):
Protect against negative STAT_BLOCKSIZE, or STAT_BLOCKSIZE
outside idx_t range.
* src/system.h: Include stdckdint.h, idx.h.
* bootstrap.conf (gnulib_modules): Add stdckdint.
* lib/cmpbuf.c: Use ckd_mul rather than INT_MULTIPLY_WRAPV.
Include stdckdint.h, not "intprops.h".
* src/diff.c: Similar, but for both ckd_add and ckd_mul.
* src/io.c: Likewise for ckd_add.
Rely on more-modern Gnulib capabilities instead of doing
integer overflow checking by hand, in some cases.
* lib/cmpbuf.c (buffer_lcm):
* src/io.c (slurp, find_identical_ends):
Use INT_ADD_WRAPV and INT_MULTIPLY_WRAPV rather than checking
overflow by hand.
* src/diff3.c (process_diff):
* src/dir.c (dir_read):
* src/io.c (find_identical_ends, read_files):
Use xnmalloc rather than checking overflow by hand.
(read_files): Rely on xcalloc to do overflow checking.
* lib/Makefile.am (noinst_HEADERS): Remove prepargs.h.
(libdiffutils_a_SOURCES): Remove prepargs.c.
* lib/prepargs.c, lib/prepargs.h: Remove. Hasn’t been
needed for many years.
* src/diff.c: Do not include prepargs.h.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
Using this file,
cat > leading-blank.exempt <<\EOF
(\.gitmodules|help2man|pre-commit)$
(?:^|\/)ChangeLog[^/]*$
(?:^|\/)(?:GNU)?[Mm]akefile[^/]*$
\.(?:am|mk)$
EOF
run the following command to convert all non-conforming leading white
space to be all spaces:
git ls-files \
| pcregrep -vf leading-blank.exempt \
| xargs pcregrep -l '^ *\t' \
| xargs perl -MText::Tabs -ni -le \
'$m=/^( *\t[ \t]*)(.*)/; print $m ? expand($1) . $2 : $_'
Since that changed old NEWS, I also ran "make update-NEWS-hash"
to update the old_NEWS_hash value in cfg.mk.
* bootstrap.conf (gnulib_tool_option_extras): Add both --symlink
and --makefile-name=gnulib.mk. Also remove now-obsolete $bt/ prefix.
* bootstrap: Update from gnulib.
* tests/init.sh: Update from gnulib.
* lib/Makefile.am: Initialize numerous variables, so that
generated code in gnulib.mk may use += to append to them.
* tests/binary: Reverse arguments to compare to avoid failure of
new syntax-check rule.
* configure.ac: Use -Wno-format-nonliteral.
Mark functions as pure of const, per recommendations enabled by
new gcc -W options. Use _GL_ATTRIBUTE_PURE and _GL_ATTRIBUTE_CONST.
* lib/cmpbuf.h (buffer_lcm, block_compare):
Apply pure and/or const attributes.
* src/cmp.c (block_compare): Likewise.
* src/context.c (find_hunk): Likewise.
* src/diff.h (lines_differ): Likewise.
* src/diff3.c (skipwhite): Likewise.
* src/dir.c (dir_loop): Likewise.
* src/util.c (find_change, find_reverse_change): Likewise.
(translate_line_number): Likewise.
* src/system.h: Include <sys/wait.h> unconditionally,
now that gnulib guarantees its presence.
* lib/cmpbuf.c: Likewise for <unistd.h> and <inttypes.h>.
Avoid a warning from automake:
lib/Makefile.am:23: AM_CFLAGS multiply defined in condition TRUE ...
lib/gnulib.mk:30: ... `AM_CFLAGS' previously defined here
lib/Makefile.am:18: `lib/gnulib.mk' included from here
* lib/Makefile.am (AM_CFLAGS): Append $(WARN_CFLAGS) and
$(WERROR_CFLAGS), i.e., use "+=", not "=".
This was introduced via 2009-12-17 commit e58efa5b
"build: enable warnings and -Werror.",
but fortunately is not a bug, because the definition
it would have overridden was always empty.
* src/Makefile.am (AM_CFLAGS): Enable warnings and -Werror.
Set to this: $(WARN_CFLAGS) $(WERROR_CFLAGS)
* lib/Makefile.am (AM_CFLAGS): Similarly, but use this:
$(GNULIB_WARN_CFLAGS) $(WERROR_CFLAGS)
* configure.ac (GNULIB_WARN_CFLAGS): Don't turn off -Wuninitialized.
* bootstrap: Sync with coreutils bootstrap, except check that
the directory build-aux exists before trying to copy to it.
* bootstrap.conf: New file.
(gnulib_modules): Add config-h, dup2, extensions, fcntl, fdl,
stat-macros, unistd.
* configure.ac: Invoke gl_EARLY and gl_INIT rather than
GNULIB_AUTOCONF_SNIPPET.
(AC_CONFIG_HEADER): Rename config.h to lib/config.h.
(AC_CHECK_HEADERS_ONCE): Don't check for fcntl.h, locale.h,
sys/file.h, unistd.h. We now use the fcntl and unistd modules,
and locale.h can be assumed for any C89 compiler.
(DIFFUTILS_PREREQUISITES): Remove. No longer needed now that
we use the stdint module.
(AC_CHECK_FUNCS_ONCE): Remove dup2, which is no longer needed
now that we use the dup2 module.
(AM_GNU_GETTEXT): Use need-formatstring-macros, and ...
(AM_GNU_GETTEXT_VERSION): specify version 0.15 instead of 0.14.5,
to be consistent with coreutils.
* lib/Makefile.am (noinst_LIBRARIES):
(lib_SOURCES, libdiffutils_a_LIBADD):
(libdiffutils_a_DEPENDENCIES, BUILT_SOURCES, EXTRA_DIST):
(MOSTLYCLEANFILES): Remove; now computed automatically.
(noinst_HEADERS, libdiffutils_a_SOURCES): Just append
our special files now.
* lib/cmpbuf.c: Include config.h unconditionally, since we
no longer define HAVE_CONFIG_H.
* lib/prepargs.c: Likewise.
* src/Makefile.am (LDADD): Use $(LIBINTL), not @LIBINTL@.
(diff_LDADD): Use $(LIB_CLOCK_GETTIME), not @LIB_CLOCK_GETTIME@.
* src/dir.c (dir_read): Use _D_EXACT_NAMLEN, not NAMELEN.
* src/system.h (volatile): Remove, since we assume C89 or better.
Include stat-macros.h.
(S_IRWXU, S_IRWXG, S_IRWXO, S_IRUSR, S_IWUSR):
Remove, since we now use stat-macros.h.
(SEEK_SET, SEEK_CUR): Remove, since we assume C89 or better.
Include unistd.h unconditionally, since we use unistd.
Likewise for fcntl.h.
(dup2): Remove, since we now use dup2.
(O_RDONLY, O_RWDR, O_BINARY): Remove, since we now use
fcntl.
Include dirent.h unconditionally.
(NAMLEN): Remove, replacing with....
(_D_EXACT_NAMLEN): New macro.
Include inttypes.h unconditionally.
(PTRDIFF_MAX, SIZE_MAX, UINTMAX_MAX, strtoumax): Remove, since
we now use inttypes.
Include locale.h unconditionally.
(setlocale): Remove, since we now assume locale.h.