1486 Commits

Author SHA1 Message Date
Paul Eggert
8fa0e68226 diff: omit HAVE_STRUCT_STAT_ST_SPARE1
* src/diff.c (main): Remove reference to macro
HAVE_STRUCT_STAT_ST_SPARE1, which hasn’t been defined since 2007.
2023-07-23 15:54:43 -07:00
Paul Eggert
5c74ccefc5 diff: get current time lazily, via C11
* bootstrap.conf (gnulib_modules): Remove gettime; add timespec_get.
* src/context.c (print_context_label): Get current time lazily.
Use C11 timespec_get rather than older Gnulib gettime function.
* src/diff.c: Do not include timespec.h.
(set_mtime_to_now): Remove.  All uses removed.
2023-07-23 15:00:02 -07:00
Paul Eggert
d0e9d67586 diff: check file attributes more carefully
* src/system.h: Include stat-time.h, timespec.h.
* bootstrap.conf (gnulib_modules): Add timespec, for timespec_cmp.
(same_file_attributes): Check birthtime and ns components too.
Check attributes earlier if they are more likely to differ.
2023-07-22 12:52:18 -07:00
Paul Eggert
a30a8d4f4d diff: don’t think mbcel_strcasecmp preserves errno
* configure.ac: Do not check for strcasecoll (which doesn’t exist)
or stricoll (not worth the porting hassle, as it doesn’t set errno).
* src/dir.c: Always include mbcel.h, since we now always
use mbcel_strcasecmp.
(strcasecoll): Remove
(compare_collated, diff_dirs): Do the setjump business only when
not ignoring file name case, since mbcel_strcasecmp doesn’t fail
and doesn’t set errno.  This fixes a bug in recent changes,
which incorrectly assumed mbcel_strcasecmp preserves errno.

* src/dir.c:
Always include mbcel.h, since we now always compile a call
to mbcel_strcasecmp.
(strcasecoll): Remove.  It’s not worth bothering to port to
Microsoft stricoll’s idiosyncrasies; mbcel_strcasecmp is good enough.
And nobody ever implemented strcasecoll.
2023-07-21 20:38:47 -07:00
Paul Eggert
8d476d03b5 diff: simplify qsort comparison function
* src/dir.c (compare_names_for_qsort): Simplify.
2023-07-21 11:51:31 -07:00
Paul Eggert
6477dce501 diff: sort multi-byte file names better
* bootstrap.conf (gnulib_modules): Add builtin-expect.
* lib/mbcel-strcasecmp.c: New file.
* lib/Makefile.am (libdiffutils_a_SOURCES): Add it.
* lib/mbcel.h (MBCEL_LEN_MAX, MBCEL_ENCODING_ERROR_SHIFT)
(MBCEL_UCHAR_FITS, MBCEL_UCHAR_EASILY_FITS): New constants.
(_GL_LIKELY): New macro.
(mbcel_scan): Use it.  Simplify NetBSD code.
(mbcel_scant, mbcel_scanz, mbcel_cmp, mbcel_casecmp): New functions.
* src/dir.c (strcasecoll): Move defn here from system.h,
since only dir.c needs it.  Use mbcel_strcasecmp instead
of strcasecmp.
2023-07-21 11:23:53 -07:00
Paul Eggert
00badee340 diff: remove diff_dirs ‘volatile’
* src/dir.c (diff_dirs): Omit ‘volatile’, as it’s no longer
needed with the current use of setjmp.
2023-07-21 11:23:53 -07:00
Paul Eggert
9c98864cb1 cmp: remove IF_LINT
* src/cmp.c (cmp): Redo to avoid the need for IF_LINT, while still
pacifying GCC.  The machine code is a bit smaller too.  The price
is a portmanteau variable, but it’s worth it.
* src/system.h (IF_LINT): Remove.
2023-07-19 11:05:24 -07:00
Paul Eggert
13799c56ed diff: simplify away an ‘IF_LINT (volatile)’
* src/dir.c (find_dir_file_pathname): Simplify.  There is no
longer a need for volatile or setjmp, now that dir_read does all
the tricky sorting and longjmping.
2023-07-19 04:32:42 -07:00
Paul Eggert
a1455251a9 diff: improve -S dir-reading performance
* src/dir.c (dir_read): New args STARTFILE and STARTFILE_ONLY,
to avoid unnecessary allocation and copying.  All uses changed.
* tests/starting-file: New test.
* tests/Makefile.am (TESTS): Add it.
2023-07-19 02:42:07 -07:00
Paul Eggert
70fd6c57ad diff: fix recently-introduced &noparent bug
* src/dir.c (diff_dirs, dir_loop): The ‘parent’ member is now
&noparent (instead of null) if there is no parent.  Patch
a couple of uses that were missed earlier.
2023-07-19 02:04:45 -07:00
Paul Eggert
c640bec4f3 diff: fix mbcel bug on NetBSD
* lib/mbcel.h (mbcel_t):
Fix bug on NetBSD as I read its code incorrectly earlier.
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2023-07/msg00085.html
Mostly for documentation, use _GL_ATTRIBUTE_MAY_ALIAS to remind
compiler not to rely on strict C semantics for unions.
2023-07-18 20:07:13 -07:00
Paul Eggert
753302982b maint: convert source from non-UTF-8
* po/en.po: Convert from Latin-1 to UTF-8.
This was the only remaining file under Git control
that still used an encoding other than UTF-8.
2023-07-18 20:07:13 -07:00
Paul Eggert
6bf2c33ea4 diff: use openat, fstatat when recursive
This should improve performance when doing recursive comparisons.
Currently there is no attempt to avoid file descriptor exhaustion,
just as previously there is no attempt to avoid file names
that provoke ENAMETOOLONG.  Because of this change, ‘diff - A/B’
now works correctly when standard input is a directory.
* .gitignore: Add lib/dirent.h.
* bootstrap.conf (gnulib_modules): Add fdopendir.
* src/diff.c (main): Initialize noparent’s desc to AT_FDCWD.
(compare_files): Use fstatat with parent directory’s file
descriptor and relative name, instead of lstat or stat.
Likewise for openat and open.
* src/diff.h (struct file_data): New member ‘dirstream’.
(struct comparison): The ‘parent’ member is now &noparent (instead
of null) if there is no parent.  All uses changed.
(curr): New toplevel variable, replacing ‘files’.  All uses changed.
* src/dir.c: Include dirname.h, for last_component.
(dir_read): New arg PARENTDIRFD.  Arg DIR is no longer
pointer-to-const since DIR->desc and DIR->dirstream are now
updated.  Use PARENTDIRFD to open the directory via
opendat+fdopendir instead of via opendir.  Update new dirstream
component instead of closing the directory, since it’s now the
caller’s responsibility to close the directory because callers now
want the file descriptor.  All callers changed.
(diff_dirs): First arg CMP is no longer pointer-to-const since
CMP->file is updated by dir_read.  All callers changed.
(find_dir_file_pathname): First arg is now struct file_data *,
not merely a file name.  All callers changed.
* tests/stdin: Test new behavior when stdin is a directory.
2023-07-18 20:07:13 -07:00
Paul Eggert
1bb156e927 maint: stop exempting lib/mbcel.h
* cfg.mk (exclude_file_name_regexp--sc_GPL_version): Remove.
2023-07-18 20:07:13 -07:00
Jim Meyering
5aa5a1cc9b maint: avoid syntax-check failure
* po/POTFILES.in: Remove openat-die.c.
It is no longer used.  Reported by Bruno Haible in
https://lists.gnu.org/r/diffutils-devel/2023-07/msg00018.html
2023-07-18 16:58:08 -07:00
Paul Eggert
856f72409b doc: document tab behavior better
* doc/diffutils.texi (Tabs): Document issues with tabs,
encoding errors, and non-ASCII characters.
2023-07-17 15:34:05 -07:00
Paul Eggert
e44fb6b802 diff: remove find_reverse_change
This is a minor refactoring and simplification.
* src/context.c (find_hunk):
* src/util.c (find_change): Rename locals for clarity.
(find_reverse_change): Remove.  All uses removed.
2023-07-17 15:34:05 -07:00
Paul Eggert
d42a8f7587 diff: assert-related cleanup
Regularize how assertions are done by using ‘unreachable’ or a new
macro ‘dassert’, or by removing unnecessary assertions.
* src/analyze.c (diff_2_files):
* src/util.c (get_funky_string, parse_diff_color)
(set_color_context):
Prefer unreachable to abort for code where it’s easy to
see that it cannot be reached.
* src/context.c (ATTRIBUTE_PURE):
* src/util.c (print_message_queue):
Prefer ‘dassert (X);’ to ‘if (!X) abort ();’.
* src/diff.c: Do not include assert.h; system.h does that if needed.
(usage): Remove the need for an assert by using fputs and fwrite
rather than printf.  This is clearer anyway.
(compare_files): Remove ‘assert’ that hardware will check.
Prefer dassert to assert.
* src/ifdef.c (do_printf_spec):
Prefer dassert to comment.
* src/system.h (dassert): New macro.  Include assert.h if needed.
2023-07-17 15:34:05 -07:00
Paul Eggert
42e4d008a6 maint: prefer puts to printf
* src/cmp.c (usage):
* src/util.c (begin_output):
Simplify by using puts instead of printf.
2023-07-17 15:34:05 -07:00
Paul Eggert
0a72d5ab28 diff: improve dir comments
* src/dir.c: Improve comments.
2023-07-17 15:34:05 -07:00
Paul Eggert
2e2731f520 diff: improve NONEXISTENT readability
* src/diff.c (NONEXISTENT, UNOPENED): Move from here ...
* src/diff.h: ... to here.
* src/dir.c (dir_read, diff_dirs): Use name for constants.
2023-07-17 15:34:05 -07:00
Paul Eggert
4c83bb78fe diff: link to LIB32CONV
* src/Makefile.am (LDADD): Add LIB32CONV; needed for recent
char32_t changes on some platforms.
2023-07-17 15:34:05 -07:00
Paul Eggert
fcb759b21a diff: reindent recent changes
* src/io.c (lines_differ): A bit less indenting.
2023-07-17 15:34:05 -07:00
Paul Eggert
df12b2c41c maint: translate lib/openat-die.c
* po/POTFILES.in: Add lib/openat-die.c.
2023-07-17 15:34:05 -07:00
Paul Eggert
3227528668 maint: mbcel.h is LGPL 3 not 2.1
* lib/mbcel.h: Make it LGPL 3, not 2.1, to pass "make syntax-check".
2023-07-17 15:34:05 -07:00
Jim Meyering
d95e016e37 maint: avoid a new syntax-check failure
* cfg.mk (exclude_file_name_regexp--sc_GPL_version): Exempt mbcel.h,
which is GPL-2.1. Remove this exemption if/when mbcel.h moves to gnulib.
2023-07-12 21:34:19 -07:00
Paul Eggert
5d0554f0a1 diff: tweak mbstate_t performance
* lib/mbcel.h (mbcel_scan): Improve performance when initializing
an mbstate_t.
2023-07-11 00:51:42 -07:00
Paul Eggert
d28395ef8d diff: add mbcel checks, compiler advice
* lib/mbcel.h: Include limits.h, stddef.h.
Add static assertions that MB_LEN_MAX has a sane value,
as the code relies on this.  Help GCC by advising
it that mbrtoc32 never returns a value between
MB_LEN_MAX + 1 and (size_t) -1 / 2 inclusive.
2023-07-10 10:16:11 -07:00
Paul Eggert
4c722ca85e diff: tweak mbcel_scan performance
* lib/mbcel.h (mbcel_scan): Check top bit of size
rather than comparing it to MB_LEN_MAX, as this
typically lets the compiler generate tighter code.
2023-07-09 02:19:50 -07:00
Paul Eggert
c15acfe2c5 maint: fix mbcel comment 2023-07-09 01:39:10 -07:00
Paul Eggert
e9a55a8edc maint: update .gitignore 2023-07-09 01:19:45 -07:00
Paul Eggert
2d48e4b13d diff: count newlines for lines_differ lengths
* src/io.c (lines_differ): Line lengths now count trailing
newlines, as this is a bit simpler.  Caller changed.
2023-07-09 01:19:45 -07:00
Paul Eggert
ef7093b760 diff: refactor lines_differ location
* src/io.c (lines_differ): Move here ...
* src/util.c: ... from here, since it needs to be kept consistent
with find_and_hash_each line anyway, and there is no reason to
make it an extern function.
2023-07-09 01:19:45 -07:00
Paul Eggert
6f522c51ed diff: simplify recent --side-by-side changes
* src/side.c: Include mbcel.h instead of uchar.h.
(print_half_line): Simplify by using mbcel_scan rather than
mbrtoc32.  Although this removes support for hypothetical platforms,
it makes the code easier to follow and a bit more efficient.
2023-07-09 01:19:45 -07:00
Paul Eggert
a542ab269a diff: support multi-byte comparison
* bootstrap.conf (gnulib_modules): Add c32isspace, c32tolower.
* lib/Makefile.am (noinst_HEADERS): Add mbcel.h.
(libdiffutils_a_SOURCES): Add mbcel.c
* lib/mbcel.c, lib/mbcel.h: New files.
* src/io.c: Include mbcel.h, uchar.h.
(hash): 2nd arg is now hash_value, not merely unsigned char,
since the caller might pass a char32_t now.
(find_and_hash_each_line): Support multi-byte input.
* src/util.c: Include mbcel.h, uchar.h.
(lines_differ): New args S1LEN, S2LEN, needed for mbcel_scan.
Caller changed.  Support multi-byte input.
* tests/ignore-case: New file.
* tests/Makefile.am (TESTS): Add it.
* tests/ignore-tab-expansion: Add UTF-8 test.
* tests/init.cfg (require_utf8_locale_): New function.
* tests/side-by-side: Use it.  Add a column-counting test.
2023-07-09 01:19:45 -07:00
Paul Eggert
2055c9cffd diff: simplify recent mbrtoc32 improvement
* src/side.c (print_half_line): Simplify.  Don't worry about
initializing mbstate until it's needed.  Avoid int overflow if the
byte sequence represents more than INT_MAX columns.  Avoid need
for separate TP1 local.
2023-07-06 17:12:04 -07:00
Bruno Haible
b79de8748b tests: Add a side-by-side output test
* tests/side-by-side: New file.
* tests/Makefile.am (TESTS): Add it.
2023-07-06 17:12:04 -07:00
Bruno Haible
2c3f8a85aa diff: Improve handling of mbrtoc32 result
* src/side.c (print_half_line): When mbrtoc32 has left the mbstate not
in the initial state, continue calling mbrtoc32.
2023-07-06 17:12:04 -07:00
Paul Eggert
572249e0fa diff: ignore tabs consistently with expanding them
* src/io.c (find_and_hash_each_line):
* src/util.c (lines_differ):
Treat '\0', '\a', '\b', '\f', '\r', '\v' consistently with how
side.c treats them when expanding them, e.g., backspacing from
column 1 is a no-op when counting tab columns.
* tests/ignore-tab-expansion: New test.
* tests/Makefile.am (TESTS): Add it.
2023-07-05 11:01:34 -07:00
Paul Eggert
6e091776f8 diff: don’t backspace before first column
* src/util.c (output_1_line): When expanding tabs, treat backspace
before column 1 as no-op, since that’s what most devices do.
* tests/expand-tabs: New test.
* tests/Makefile.am (TESTS): Add it.
2023-07-05 11:01:34 -07:00
Paul Eggert
43b7a667e5 diff: tweak -y performance for $ @ ` \a
* src/side.c (print_half_line): Improve performance for '$', '@',
'`', and '\a' since they also are portable in practice nowadays.
2023-07-05 11:01:34 -07:00
Bruno Haible
93a5a16852 diff: Fix "diff -y" output
This fixes a regression from 2023-07-04.

* src/side.c (print_half_line): Restore the assignment to out_position.
2023-07-05 10:34:01 -07:00
Paul Eggert
8474b6e088 doc: mention bug#64461 in NEWS 2023-07-04 10:45:31 -07:00
Bruno Haible
642e2397f2 diff: Fix output of "diff -l -y" for non-ASCII input files
* src/side.c (print_half_line): Output the multibyte character to out,
not stdout.
2023-07-04 10:45:31 -07:00
Paul Eggert
1f5d520fa1 diff: fix unspecified mbstate after encoding error
* src/side.c (print_half_line): Clear mbstate after encoding
error, since it’s unspecified.
2023-07-04 10:45:31 -07:00
Paul Eggert
ffafa15eec diff: optimize -y treatment of NUL
* src/side.c (print_half_line): Treat '\0' like other control
characters with print width zero.
2023-07-04 10:45:31 -07:00
Paul Eggert
05cdf3102e diff: fix unlikely intmax_t overflow
* src/side.c (print_half_line): Avoid undefined behavior if the
input column position overflows.  Instead, simply stop printing.
2023-07-04 10:45:31 -07:00
Paul Eggert
7aa48caa06 maint: prefer ‘static_assert’
Prefer C23-style ‘static_assert’ to traditional Gnulib ‘verify’.
* bootstrap.conf (gnulib_modules): Add assert-h.
* src/context.c, src/io.c, src/system.h: Use ‘static_assert’.
2023-06-28 15:25:21 -07:00
Paul Eggert
359b8c3ef2 diff: fix xpalloc-related signed integer overflow
Problem reported by Gisle Vanem <https://bugs.gnu.org/64316>.
* src/io.c (find_and_hash_each_line):
Rely on xpalloc to check for integer overflow instead
of trying to do it ourselves incorrectly, with old code
that predated the use of xpalloc.
* src/system.h: Verify that LIN_MAX == IDX_MAX,
since the code now relies on this.
* tests/Makefile.am (TESTS): Add bug-64316.
* tests/bug-64316: New file
2023-06-28 15:25:21 -07:00