Go back to a single mcel module, instead of trying to break it up
into ucore and mcel pieces, as breaking it up hurt performance.
Use gnulib-tool’s --local-dir to create diffutils-specific modules
for mcel; the idea is that this will eventually migrate into Gnulib.
* bootstrap.conf (avoided_gnulib_modules): Add mbuiterf.
(gnulib_modules): Add mbscasecmp, mcel-prefer.
(gnulib_tool_option_extras): Add --local-dir=gl to pick up new files.
* cfg.mk (exclude_file_name_regexp--sc_prohibit_doubled_word):
Do not exclude now-removed files lib/ucore.c, lib/ucore.h.
* lib/Makefile.am: Adjust to use of modules.
(noinst_HEADERS): Remove mcel.h, ucore.h.
(libdiffutils_a_SOURCES): Remove mcel.c, mcel-casecmp.c, ucore.c
* lib/mcel-casecmp.c, lib/ucore.c, lib/ucore.h: Remove.
* lib/mcel.h: Switch to LGPLv2.1+. Do not include ucore.h.
All uses of ucore_t changed back to using char32_t.
Do what ucore.h used to do: include verify.h, limits.h, stddef.h,
uchar.h; require config.h, define _GL_LIKELY, _GL_UNLIKELY.
(MCEl_CHAR_MAX, MCEL_ERR_MIN, MCEL_ERR_MAX): New constants.
(mcel_t): Switch from single ucore_t c to a char32_t ch and
unsigned char err. This has significantly better performance on
Fedora 38 x86-64. All uses changed. Check that unsigned char
promotes to int.
(mcel_ch, mcel_err, mcel_cmp, mcel_tocmp): New functions.
(MCEL_ERR_SHIFT): Rename from MCEL_ENCODING_ERROR_SHIFT.
All uses changed.
(mcel_isbasic): Add a _GL_LIKELY to help compilers. All uses changed.
(mcel_scan, mcel_scant): Simplify by using mcel_ch, mcel_err.
(mcel_casecmp): Remove decl. Callers changed to use mbscasecmp.
* gl/lib/mcel.c, gl/lib/mcel.h: Rename from lib/mcel.c, lib/mcel.h.
* gl/lib/mbscasecmp.c: New file.
* gl/modules/mcel, gl/modules/mcel-prefer, gl/modules/mcel-tests:
* gl/tests/test-mcel.c:
New files.
* src/io.c: Revert use of ucore API. Use plain c32isspace etc.
instead of ucore_is. Use .err instead of ucore_iserr.
(same_ch_err): Bring back, and use it instead of ucore_cmp.
* src/side.c (print_half_line): Use .err instead of ucore_iserr.
* lib/Makefile.am: Adjust to file renamings and additions.
* lib/mbcel.c, lib/mbcel.h: Split into two APIs, replacing with ...
* lib/mcel.c, lib/mcel.h, lib/ucore.c, lib/ucore.h: ... these new files.
* lib/mcel.h: Simplify by assuming ucore.h is included.
Check that bytes have 8 bits.
(MCEL_LEN_MAX, mcel_t, MCEL_INLINE, MCEL_ENCODING_ERROR_SHIFT)
(mcel_scan, mcel_scant, mcel_scanz, mcel_casecmp):
Rename from MBCEL_LEN_MAX, mbcel_t, MBCEL_INLINE,
MBCEL_ENCODING_ERROR_SHIFT, mbcel_scan, mbcel_scanz, mbcel_scant,
mbcel_casecmp.
(mcel_t): New member c, replacing old members ch and err.
All uses changed.
(MBCEL_UCHAR_FITS, MBCEL_UCHAR_EASILY_FITS): Remove.
All uses removed. No longer needed now 8-bit bytes are assumed.
(MCEL_ENCODING_ERROR_SHIFT): Check that it matches UCORE_ERR_MIN.
(mcel_isbasic): New function. Use it where appropriate.
(mbcel_cmp, mbcel_casecmp): Remove; replaced by ucore_cmp,
ucore_tocmp. All uses changed.
* lib/mcel-casecmp.c: Rename from lib/mbcel-strcasecmp.c.
Include mcel.h instead of mbcel.h.
(mcel_casecmp): Rename from mbcel_strcasecmp. All uses changed.
Assert that UCHAR_MAX <= INT_MAX, as POSIX requires,
and simplify code accordingly. Use mcel rather than mbcel.
* lib/ucore.h: Include verify.h.
(ucore_t): New type.
(UCORE_CHAR_MAX, UCORE_ERR_MIN, UCORE_ERR_MAX, UCORE_C32_SAFE):
New constants. Check that information is not lost by encoding
errors as integers; this is a weaker test than CHAR_BIT == 8.
(ucore_iserr, ucore_is, ucore_to): New functions.
(ucore_cmp, ucore_tocmp): New functions, replacing the old
mbcel_cmp, mbcel_casecmp. All uses changed.
* src/dir.c, src/io.c, src/side.c: Use mcel rather than mbcel.
* src/io.c (same_ch_err): Remove. All uses replaced by ucore_cmp.
* bootstrap.conf (gnulib_modules): Add builtin-expect.
* lib/mbcel-strcasecmp.c: New file.
* lib/Makefile.am (libdiffutils_a_SOURCES): Add it.
* lib/mbcel.h (MBCEL_LEN_MAX, MBCEL_ENCODING_ERROR_SHIFT)
(MBCEL_UCHAR_FITS, MBCEL_UCHAR_EASILY_FITS): New constants.
(_GL_LIKELY): New macro.
(mbcel_scan): Use it. Simplify NetBSD code.
(mbcel_scant, mbcel_scanz, mbcel_cmp, mbcel_casecmp): New functions.
* src/dir.c (strcasecoll): Move defn here from system.h,
since only dir.c needs it. Use mbcel_strcasecmp instead
of strcasecmp.
* bootstrap.conf (gnulib_modules): Add c32isspace, c32tolower.
* lib/Makefile.am (noinst_HEADERS): Add mbcel.h.
(libdiffutils_a_SOURCES): Add mbcel.c
* lib/mbcel.c, lib/mbcel.h: New files.
* src/io.c: Include mbcel.h, uchar.h.
(hash): 2nd arg is now hash_value, not merely unsigned char,
since the caller might pass a char32_t now.
(find_and_hash_each_line): Support multi-byte input.
* src/util.c: Include mbcel.h, uchar.h.
(lines_differ): New args S1LEN, S2LEN, needed for mbcel_scan.
Caller changed. Support multi-byte input.
* tests/ignore-case: New file.
* tests/Makefile.am (TESTS): Add it.
* tests/ignore-tab-expansion: Add UTF-8 test.
* tests/init.cfg (require_utf8_locale_): New function.
* tests/side-by-side: Use it. Add a column-counting test.
* lib/Makefile.am (noinst_HEADERS): Remove prepargs.h.
(libdiffutils_a_SOURCES): Remove prepargs.c.
* lib/prepargs.c, lib/prepargs.h: Remove. Hasn’t been
needed for many years.
* src/diff.c: Do not include prepargs.h.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* bootstrap.conf (gnulib_tool_option_extras): Add both --symlink
and --makefile-name=gnulib.mk. Also remove now-obsolete $bt/ prefix.
* bootstrap: Update from gnulib.
* tests/init.sh: Update from gnulib.
* lib/Makefile.am: Initialize numerous variables, so that
generated code in gnulib.mk may use += to append to them.
Avoid a warning from automake:
lib/Makefile.am:23: AM_CFLAGS multiply defined in condition TRUE ...
lib/gnulib.mk:30: ... `AM_CFLAGS' previously defined here
lib/Makefile.am:18: `lib/gnulib.mk' included from here
* lib/Makefile.am (AM_CFLAGS): Append $(WARN_CFLAGS) and
$(WERROR_CFLAGS), i.e., use "+=", not "=".
This was introduced via 2009-12-17 commit e58efa5b
"build: enable warnings and -Werror.",
but fortunately is not a bug, because the definition
it would have overridden was always empty.
* src/Makefile.am (AM_CFLAGS): Enable warnings and -Werror.
Set to this: $(WARN_CFLAGS) $(WERROR_CFLAGS)
* lib/Makefile.am (AM_CFLAGS): Similarly, but use this:
$(GNULIB_WARN_CFLAGS) $(WERROR_CFLAGS)
* configure.ac (GNULIB_WARN_CFLAGS): Don't turn off -Wuninitialized.
* bootstrap: Sync with coreutils bootstrap, except check that
the directory build-aux exists before trying to copy to it.
* bootstrap.conf: New file.
(gnulib_modules): Add config-h, dup2, extensions, fcntl, fdl,
stat-macros, unistd.
* configure.ac: Invoke gl_EARLY and gl_INIT rather than
GNULIB_AUTOCONF_SNIPPET.
(AC_CONFIG_HEADER): Rename config.h to lib/config.h.
(AC_CHECK_HEADERS_ONCE): Don't check for fcntl.h, locale.h,
sys/file.h, unistd.h. We now use the fcntl and unistd modules,
and locale.h can be assumed for any C89 compiler.
(DIFFUTILS_PREREQUISITES): Remove. No longer needed now that
we use the stdint module.
(AC_CHECK_FUNCS_ONCE): Remove dup2, which is no longer needed
now that we use the dup2 module.
(AM_GNU_GETTEXT): Use need-formatstring-macros, and ...
(AM_GNU_GETTEXT_VERSION): specify version 0.15 instead of 0.14.5,
to be consistent with coreutils.
* lib/Makefile.am (noinst_LIBRARIES):
(lib_SOURCES, libdiffutils_a_LIBADD):
(libdiffutils_a_DEPENDENCIES, BUILT_SOURCES, EXTRA_DIST):
(MOSTLYCLEANFILES): Remove; now computed automatically.
(noinst_HEADERS, libdiffutils_a_SOURCES): Just append
our special files now.
* lib/cmpbuf.c: Include config.h unconditionally, since we
no longer define HAVE_CONFIG_H.
* lib/prepargs.c: Likewise.
* src/Makefile.am (LDADD): Use $(LIBINTL), not @LIBINTL@.
(diff_LDADD): Use $(LIB_CLOCK_GETTIME), not @LIB_CLOCK_GETTIME@.
* src/dir.c (dir_read): Use _D_EXACT_NAMLEN, not NAMELEN.
* src/system.h (volatile): Remove, since we assume C89 or better.
Include stat-macros.h.
(S_IRWXU, S_IRWXG, S_IRWXO, S_IRUSR, S_IWUSR):
Remove, since we now use stat-macros.h.
(SEEK_SET, SEEK_CUR): Remove, since we assume C89 or better.
Include unistd.h unconditionally, since we use unistd.
Likewise for fcntl.h.
(dup2): Remove, since we now use dup2.
(O_RDONLY, O_RWDR, O_BINARY): Remove, since we now use
fcntl.
Include dirent.h unconditionally.
(NAMLEN): Remove, replacing with....
(_D_EXACT_NAMLEN): New macro.
Include inttypes.h unconditionally.
(PTRDIFF_MAX, SIZE_MAX, UINTMAX_MAX, strtoumax): Remove, since
we now use inttypes.
Include locale.h unconditionally.
(setlocale): Remove, since we now assume locale.h.
(specify_ignore_initial): Reword to avoid gcc -W warnings.
(main): Use freopen instead of setmode, since freopen is in POSIX.
* src/context.c: Revert most 2004-09-01 changes. Then:
(TIMESPEC_NS): Remove. All uses replaced by
get_stat_mtime_ns.
Include stat-time.h, strftime.h.
(nstrtime): Remove decl.
* src/diff.c: Revert most 2004-09-01 changes. Then:
Don't include <posixver.h>, <quotesys.h>, <setmode.h>.
Include <sh-quote.h>, <stat-time.h>, <timespec.h>.
All uses of quotesys replaced by sh-quote.
(main, compare_files):
Use freopen instead of setmode, since freopen is in POSIX.
(main): Don't complain about "diff -NUM'.
(main, set_mtime_to_now):
Adjust to stat-time.h macros when accessing nanoseconds.
* src/diff3.c: Include sh-quote.h rather than quotesys. All uses
changed.
* src/dir.c (dir_read): excluded_filename renamed to
excluded_file_name.
* src/io.c: Don't include <setmode.h>.
(sip, read_files): Remove binary file stuff, leaving a FIXME behind.
A DOS expert needs to look at this.
* src/diff.c: Include sh-quote.h rather than quotesys.h.
All uses changed.
* src/system.h: Include verify.h.
(verify): Remove. All uses changed to verify.h version.
Include <intprops.h>.
(TYPE_SIGNED, TYPE_MINIMUM, TYPE_MAXIMUM): Remove. Now uses
intprops.h versions.
(O_BINARY): New defns, taken from coreutils.
* src/util.c: Include sh-quote.h rather than quotesys.h.
All uses changed.
(EXTRA_DIST, noinst_HEADERS): Remove most entries.
(libdiffutils_a_SOURCES): Now just lib_SOURCES.
(lib_SOURCES): New macro.
(DISTCLEANFILES, MOSTLYCLEANFILES): Set to empty now.
(gnulib.mk): Include: this does most of the work eliminated
by the above changes.