Compare commits

...

443 Commits
v3.3 ... master

Author SHA1 Message Date
Jim Meyering
071ac3aa76 also update bootstrap from gnulib 2026-01-02 16:50:40 -08:00
Jim Meyering
c635f7dd92 maint: update copyright dates 2026-01-02 16:42:12 -08:00
Jim Meyering
37b95973aa build: update gnulib to latest 2026-01-02 16:42:12 -08:00
Bruno Haible
22533e58ff build: Respect gnulib code ownership.
* gnulib-tests/Makefile.am (AM_CFLAGS): Don't augment after including
gnulib.mk.
2025-11-12 14:11:09 -08:00
Jim Meyering
e6d5e6809b build: update gnulib to latest 2025-11-12 14:11:09 -08:00
Paul Eggert
07a3bb2b44 build: use -Wtrailing-whitespace
* configure.ac: With --enable-gcc-warnings,
use GCC 15’s -Wtrailing-whitespace if available.
2025-10-16 08:10:16 -07:00
Paul Eggert
8185556858 doc: update troff comment
* doc/grep.in.1: Update troff comment.
Reported by G. Branden Robinson (Bug#79608).
2025-10-10 15:07:58 -07:00
Collin Funk
275600f387 maint: fix Automake warning
Avoid this warning: escaping \# comment markers is not portable
* Makefile.am (prologue): Remove macro.
(THANKS): Use the perl command directly.
2025-07-07 11:10:00 -07:00
Bruno Haible
db5172dc2b maint: use module 'kwset' from gnulib
* bootstrap.conf (gnulib_modules): Add kwset.
* src/kwset.h: Remove file.
* src/kwset.c: Remove file.
* src/Makefile.am (grep_SOURCES): Remove kwset.c.
(noinst_HEADERS): Remove kwset.h.
2025-06-26 06:36:50 -07:00
Jim Meyering
1665c885f2 build: update gnulib to latest, and update bootstrap from gnulib 2025-06-26 06:36:50 -07:00
Jim Meyering
682f7f693d tests: write-error-msg: avoid false-failure
* tests/write-error-msg: Do not require that a disk full
diagnostic include additional information. In some cases, there
is no valid errno value, so we cannot provide more information.
This was exposed by a patch that coincidentally caused the length
of grep's help output to be precisely 4096 bytes long.
Reported in https://bugs.gnu.org/77800
2025-05-14 10:56:16 -07:00
Paul Eggert
335fcd3f53 build: port to pkg.m4 serial 12
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/grep-devel/2025-04/msg00005.html
* bootstrap.conf (bootstrap_post_import_hook):
Simplify. Don’t recommend against pkg.m4 serial 12,
since the following patches port to it.
* configure.ac: Don’t fail if pkg-config is missing.
* m4/pcre.m4 (gl_FUNC_PCRE): If pkg-config works use that;
otherwise if PCRE_CFLAGS='' PCRE_LIBS='-lpcre2-8' use that;
otherwise don’t use PCRE.
2025-04-19 23:15:35 -07:00
Paul Eggert
2e19d07ef1 build: update gnulib submodule to latest 2025-04-19 23:15:35 -07:00
Jim Meyering
b871c3e428 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2025-04-10 09:04:07 -07:00
Jim Meyering
3f8c09ec19 version 3.12
* NEWS: Record release date.
2025-04-10 09:01:30 -07:00
Jim Meyering
bd7250ca36 tests: mb-non-UTF8-perf-Fw: undo previous change
* tests/mb-non-UTF8-perf-Fw: Ugh. I misread the code and
didn't even test.  Given circumstances and the new timing that's
well within the 30-second timeout, I think there's no point
in trying to accommodate systems that are so overburdened they
trigger this failure. Reported by Grisha Levit.
2025-04-09 09:58:53 -07:00
Jim Meyering
f8bb8c519e tests: mb-non-UTF8-perf-Fw:: avoid false failure on overloaded systems
* tests/mb-non-UTF8-perf-Fw: Raise timeout from 1 to 3s
to accommodate slow systems.  Reported by Nelson Beebe in
https://lists.gnu.org/r/grep-devel/2025-04/msg00027.html
2025-04-09 09:05:57 -07:00
Jim Meyering
082f068a5e build: fix module name typo
* bootstrap.conf (gnulib_modules): Fix module name typo I introduced.
s/realloc-gnu-h/realloc-posix/. Reported by Bruno Haible in
https://bugs.gnu.org/77654
2025-04-08 13:31:37 -07:00
Jim Meyering
dc292e8bb0 build: avoid new bootstrap failure
* bootstrap.conf (bootstrap_post_import_hook): Append "|| :", fixing
my previous change. Otherwise, bootstrap would fail with this:
./bootstrap: bootstrap_post_import_hook failed
2025-04-08 13:29:32 -07:00
Bruno Haible
b1dee0f8b3 doc: temper a Unicode support claim: it's not quite done
* NEWS: clarify that the "Unicode characters outside the Basic
Multilingual Plane" item is not quite done.
2025-04-08 11:06:04 -07:00
Jim Meyering
b197be563e tests: hash-collision-perf: avoid test hang on GNU/Hurd
This test would hang on GNU/Hurd because the perl code we use to measure
subsecond duration isn't ported, and that loop would never terminate.
* tests/hash-collision-perf: Detect the always-0 small_ms, and skip the test.
Reported by Bruno Haible in https://bugs.gnu.org/77613
2025-04-08 09:32:24 -07:00
Jim Meyering
52418599b3 grep: avoid regression with -mN and any of -q, -l, -L
* src/grep.c (grepbuf): Handle this case: echo x|grep -l -m1 .
making it print only the file name, and not the matched line.
(main): Set out_quiet also when exit_on_match (-q) is set, so
"echo x|grep -q -m1 ." no longer prints the matched line.
* tests/max-count-overread: Add those tests, from
https://bugs.gnu.org/68989#21
2025-04-08 09:32:24 -07:00
Jim Meyering
a4628e58dd build: avoid using pkg-config's pkg.m4 serial 12
* bootstrap.conf (bootstrap_post_import_hook): Add code to ensure we
do not use pkg.m4 serial 12. For the record, I've temporarily copied
the version of pkg.m4 from grep-3.11 into $(aclocal --print-ac-dir),
so that when I run bootstrap, it always gets that serial 11 version.
Reported by Bruno Haible in
https://lists.gnu.org/r/grep-devel/2025-04/msg00005.html
2025-04-06 18:45:29 -07:00
Jim Meyering
05f8c68183 build: update gnulib to latest 2025-04-04 14:22:48 -07:00
Jim Meyering
2f5068b6ea maint: ensure that new "make syntax-check"-run sc_codespell passes
* cfg.mk (codespell_ignore_words_list): Ignore some false-positives.
2025-03-27 20:57:13 -07:00
Jim Meyering
50c4df64c1 build: use gnulib's new c-strcasecmp module, rather than c-strcase
* bootstrap.conf (gnulib_modules): Use new c-strcasecmp module, rather
than c-strcase, since grep uses c_strcasecmp and not c_strncasecmp.
2025-03-25 19:17:55 -07:00
Jim Meyering
6de7c9d48b build: update gnulib to latest; and update bootstrap 2025-03-25 19:17:55 -07:00
Jim Meyering
ffb27fd225 grep: remove long-deprecated --unix-byte-offsets (-u) option
* src/grep.c (main): Remove vestiges of --unix-byte-offsets (-u).
In grep-3.7 (2021-08-14) it became a warning-only no-op.
Before then, it was a Windows-only no-op.
* NEWS (Changes in behavior): Mention it.
2025-03-25 19:11:38 -07:00
Paul Eggert
920daa57a4 doc: update man for groff 1.23.0 and apostrophes
* doc/grep.in.1: Merge from groff 1.23.0 tmac/an-ext.tmac.
On Groff, fix usage neutral apostrophes;
they cannot be reliably fixed in traditional troff.
2025-03-21 13:17:03 -07:00
Paul Eggert
9863d53a5f doc: use \w@...@ not \w|...|
* doc/grep.in.1: Avoid warnings in bleeding-edge groff.
2025-03-21 13:17:03 -07:00
Paul Eggert
ef595c086b doc: fix troff typo
* doc/grep.in.1: .BR → .B (Bug#77000).
2025-03-21 09:36:24 -07:00
Jim Meyering
5cc5251d5d grep: support gnulib-l10n
* src/grep.c (main): Call bindtextdomain for gnulib-l10n.
2025-02-08 20:13:17 -08:00
Jim Meyering
3160603308 maint: continue writing base64-encoded checksums to announcement
* cfg.mk (announce_gen_args): Set to --cksum-checksums.
2025-02-08 20:07:49 -08:00
Jim Meyering
0747169015 build: update gnulib to latest; and update bootstrap 2025-02-01 21:22:11 -08:00
Jim Meyering
006951de68 maint: reflect gnulib module renamings
* bootstrap.conf: Some gnulib modules are now deprecated, in
favor of new names with a "-h" suffix (and stdbool->bool).
Induce this change with the following:
  re='inttypes|locale|realloc-gnu|stdckdint|stddef|stdlib|string'
  re="$re|sys_stat|unistd"
  perl -pi -e 's{^('"$re"')$}{$1-h};s{^stdbool$}{bool}' bootstrap.conf
2025-02-01 16:53:44 -08:00
Jim Meyering
fc6aba9000 doc: clarify a --help sentence
* src/grep.c (usage): Prompted by a suggestion at
https://bugs.gnu.org/75582 by Anton Samokat.
* THANKS.in: Add that name.
2025-01-15 21:00:17 -08:00
Paul Eggert
b1eaccd96d maint: update all copyright dates via "make update-copyright" 2025-01-01 19:15:26 -08:00
Paul Eggert
ba98ec78f5 maint: update bootstrap from Gnulib
* bootstrap: sync from Gnulib
2025-01-01 19:15:26 -08:00
Paul Eggert
ad030d9bbb build: update gnulib submodule to latest 2025-01-01 19:15:26 -08:00
Paul Eggert
6ee856200a grep: revert recent \d change
I misread the email thread and thought there was consensus
for the \d change, but there was wasn’t so revert the change.
Also, document the resulting confusion
somewhat better than it was documented before.
* src/pcresearch.c, tests/pcre-ascii-digits, tests-pcre-utf8-w:
Revert recent \d change, restoring the behavior to that of grep 3.11.
2024-12-16 14:43:00 -07:00
Paul Eggert
19e301ad53 doc: give an example non-ASCII digit
* doc/grep.texi: Give ‘٣’ as an example of a non-ASCII digit.
2024-12-16 14:36:49 -07:00
Paul Eggert
421b2993e2 doc: don’t send “ſ” to PDF
* doc/grep.texi: Don’t output “ſ” (U+017F LATIN SMALL LETTER LONG S)
to PDF, since pdfTeX can’t handle it.
2024-12-16 13:25:26 -07:00
Paul Eggert
aa203fdaa9 doc: more improvements for -P discussion
* doc/grep.texi (grep Programs): Also mention git grep
and pcre2grep.
2024-12-16 01:41:37 -07:00
Paul Eggert
7ddaa55cab doc: improve -P discussion
* doc/grep.texi (grep Programs): Improve discussion of how grep -P
differs from Perl.
2024-12-14 14:15:00 -07:00
Paul Eggert
eaca869822 grep: go back to 3.9 -P '\d' behavior
Treating \d differently from Perl was more trouble than it was worth.
* NEWS, doc/grep.texi (grep Programs): Document this.
* src/pcresearch.c (PCRE2_EXTRA_ASCII_BSD):
Remove.  All uses removed.
* tests/pcre-ascii-digits: Adjust to this change.
* tests/pcre-utf8-w: Revert to 3.9.
2024-12-14 14:15:00 -07:00
Grisha Levit
29a9b72db3 tests: fix define for glibc-infloop
* configure.ac (USE_INCLUDED_REGEX): fix condition for definition.
This doesn't affect anything right now since the value is examined
only by an unconditionally skipped test.
2024-11-26 16:15:02 -08:00
Paul Eggert
24deafb92f doc: more consistent style for ‘...’
Problem reported by Martin Schulte <https://bugs.gnu.org/74205>.
* doc/grep.in.1, doc/grep.texi:
Use a more consistent style for [OPTION]... and [FILE]... in usage.
This doesn’t match what POSIX does but seems to be common in GNU doc.
Also, ‘...’ -> ‘@dots{}’ in grep.texi.
2024-11-04 13:52:19 -08:00
Paul Eggert
fce28c4a5e grep: fix -q suppression of diagnostics
Problem reported by Jan Černohorský (Bug#74159).
* src/grep.c (grepbuf): If exit_on_match, set stdout_errno to
avoid screwups on buggy OSes.  Also, ignore errseen since it
cannot be true here.
(main): Do not clear exit_failure if -q is given, as exit status
should be zero only if an input line is selected.
* tests/write-error-msg: Check that -q suppresses diagnostics
of output errors only if a match is found.
2024-11-01 22:46:33 -07:00
Paul Eggert
944c2eccc7 doc: warn re using ‘grep’ to detect binary files
This is in response to a bug report by Rodrigo Jorge
<https://bugs.gnu.org/73360>.
* doc/grep.texi (File and Directory Selection):
Warn that ‘grep’ shouldn’t be used to determine whether
a file is binary for other applications’ purposes, as
their definition of “binary” may well differ.
Improve documentation for discovery of null input.
2024-09-21 23:28:03 -07:00
Paul Eggert
8fb15fb5bf grep: avoid huge reads
The previous code could call 'read' with a nearly unbounded size
if the input had long lines, and this unbounded size persisted
from one file to the next once the input buffer grew.
This could have bad effects on the CPU's data cache,
and also could cause 'grep' to make counterintuitive decisions as
to whether a file is binary <https://bugs.gnu.org/73360>.
Instead, pick a good read size and stick with it; this is
more consistent, and more likely to fit in a cache.
* src/grep.c (good_readsize): New static var.
(GOOD_READSIZE_MIN): Rename from INITIAL_BUFSIZE.  All uses changed.
(fillbuf): Read good_readsize bytes rather than trying to
fill the rest of the input buffer.
(drain_input): Read good_readsize rather than GOOD_READSIZE_MIN
bytes.
(main): Initialize good_readsize.
2024-09-21 23:28:03 -07:00
Paul Eggert
288ea84c70 grep: simplify non-usage of rawmemrchr
* src/grep.c (grep): Simplify by not assuming that the
code will eventually use a nonexistent rawmemrchr function.
2024-09-21 23:28:03 -07:00
Paul Eggert
c89ce1cd48 grep: adjust to safe_read change
* src/grep.c (fillbuf): Adjust to Gnulib safe_read API change;
it now returns ptrdiff_t, with -1 signifying error.
2024-09-21 23:28:03 -07:00
Paul Eggert
37ed0f5621 maint: port GNULIB_WARN_CFLAGS to gcc 14
* configure.ac (GNULIB_WARN_CFLAGS): Add -Wmissing-declarations,
-Wmissing-prototypes, -Wmissing-variable-declarations,
-Wnull-dereference, -Wsuggest-attribute=cold.
2024-09-21 23:28:03 -07:00
Paul Eggert
cb83e12460 build: update gnulib submodule to latest 2024-09-21 23:28:03 -07:00
Jim Meyering
e7481a0939 build: update gnulib to latest 2024-09-08 10:27:16 -07:00
Jim Meyering
08c4ce064b maint: placate GCC's -Wunterminated-string-initialization
This new GCC warning triggered a false alert. But it's simple,
cleaner and just as efficient to use pointers and strlen:
* src/pcresearch.c (Pcompile): Declare prefix and suffix
variables as pointers to literal strings, rather than as
arrays of characters that deliberately omitted the trailing
NUL byte, and which provoked these warnings from the very
latest GCC:

pcresearch.c:220:23: error: initializer-string for array of
'char' is too long [-Werror=unterminated-string-initialization]
220 |  wprefix[10] = "(?<!\\w)(?:", wsuffix[7] = ")(?!\\w)";
    |                            ^~~~~~~~~~~~~
2024-09-08 10:10:02 -07:00
Jim Meyering
1c9e7544cf maint: modernize GNU GPL license comments
To accommodate a new syntax-check rule, ...
Make comments that suggested to write to an old FSF Franklin Street
address refer to https://www.gnu.org/licenses instead:

  git grep -l 'if not, write' |xargs \
  perl -0777 -pi -e 's{program; if not, write .*? USA\.}{program.  If not, see <https://www.gnu.org/licenses/>.}ms'
2024-09-08 09:45:52 -07:00
Bruno Haible
eda769be72 tests: Fix recognition of cs_CZ.UTF-8 locale on FreeBSD.
* tests/fmbtest: Use 'locale charmap' to determine the locale's encoding.
* tests/foad1: Likewise.
2024-07-09 14:44:46 +02:00
Collin Funk
3612f5e218 maint: import tests/init.sh from Gnulib during bootstrap
* bootstrap.conf (bootstrap_post_import_hook): Use gnulib-tool
--copy-file to import tests/init.sh.
* tests/init.sh: Remove file.
* .gitignore (/tests/init.sh): Add entry.
2024-07-04 12:54:56 -07:00
Jim Meyering
37a1e07606 build: update gnulib to latest and update bootstrap 2024-07-04 11:05:39 -07:00
Bruno Haible
58d2475965 maint: Avoid test-mbrlen-1.sh failure on CentOS 7.
* configure.ac: Pre-set gl_cv_func_mbrlen_empty_input.
* bootstrap.conf (avoided_gnulib_modules): Avoid also mbrlen-tests.
2024-06-11 09:01:24 -07:00
Paul Eggert
53b889155f doc: fix troff typos
* doc/grep.in.1: Fix troff typos found by mandoc and groff.
Problem reported by Bjarni Ingi Gislason (bug#71087).
2024-05-21 09:51:32 -07:00
Paul Eggert
1fa829d367 build: update bootstrap to latest 2024-05-21 09:51:32 -07:00
Paul Eggert
be9fcc2d2d build: update gnulib submodule to latest 2024-05-21 09:51:32 -07:00
Jim Meyering
b4dd3b00a5 build: update gnulib to latest 2024-02-18 21:04:05 -08:00
Jim Meyering
3d900da3b5 maint: avoid syntax-check failure: use <>, not "" for system headers
* src/dfasearch.c: As above.
* src/grep.c: Likewise.
* src/kwsearch.c: Likewise.
* src/pcresearch.c: Likewise.
* src/searchutils.c: Likewise.
2024-02-18 20:04:31 -08:00
Paul Eggert
b9a8047099 grep: fix ‘grep -m2 pattern <file >/dev/null’
Problem reported by Grisha Levit <https://bugs.gnu.org/68989>.
* src/grep.c (grep, main): Don’t set done_on_match if -m is used.
* tests/max-count-overread: Add a test case.
2024-02-09 01:07:19 -08:00
Paul Eggert
443961a929 Improve doc for range expressions
grep currently doesn’t implement rational ranges or any other
particular behavior for range expressions outside the C locale.
Adjust the documentation to match the behavior more closely.
Problem reported by Ronan Pigott in:
https://lists.gnu.org/r/grep-devel/2024-01/msg00000.html
* doc/grep.texi (Character Classes and Bracket Expressions):
Be more careful about terminology.  Don’t say “sorts” because
the collation sequence is not the same as the sort order.
Don’t make promises about behavior outside the C locale,
as the current code might not fulfill them.
* doc/grep.in.1: Adjust wording to match.  The old wording
was out-of-sync anyway.
2024-01-28 22:57:17 -08:00
Jim Meyering
e248db797a maint: update all copyright dates via "make update-copyright" 2024-01-05 09:19:41 -08:00
Jim Meyering
4810ea0838 build: update gnulib to latest; also update bootstrap and init.sh 2024-01-05 09:19:01 -08:00
Paul Eggert
102be2bfa5 grep: prefer nullptr to NULL
* bootstrap.conf (gnulib_modules): Add nullptr.
All uses of NULL eliminated or changed to nullptr,
modernizing us from C89 to C23.
2023-09-14 01:01:02 -05:00
Paul Eggert
dd8f04957c grep: simplify wordchars_count
* src/searchutils.c (wordchars_count): Simplify slightly
by using a pointer rather than an offset.
2023-09-11 22:51:56 -05:00
Paul Eggert
554e5b25fe grep: use mbszero
* bootstrap.conf (gnulib_modules): Add mbszero.
* src/grep.c (buf_has_encoding_errors, contains_encoding_error)
(setup_ok_fold, fgrep_icase_available, fgrep_to_grep_pattern)
(try_fgrep_pattern):
* src/searchutils.c (mb_goback, wordchars_count):
Use it.
2023-09-11 22:51:56 -05:00
Paul Eggert
052282642c maint: wchar-single, no strtou*
* bootstrap.conf (gnulib_modules): Remove strtoull, strtoumax.
They are no longer used.  Use wchar-single instead of wchar,
since grep does not change locale.
2023-09-09 22:03:10 -07:00
Paul Eggert
f80b106d15 grep: omit propername, as it’s not used
Omit Gnulib’s propername module, as it has not been used since my
commit 3c0a36e514237132db711bfef57a74c64592c4e2 dated Thu Dec 20
16:35:55 2018 -0800.
* bootstrap.conf (avoided_gnulib_modules):
Do not avoid mbchar, as it is no longer pulled in by propername.
(gnulib_modules): Remove propername.
* src/Makefile.am (LDADD):
* tests/Makefile.am (LDADD): Remove $(LIBICONV); no longer needed.
* src/grep.c: Do not include propername.h.
2023-09-09 18:54:20 -07:00
Paul Eggert
3e926715c8 maint: prefer mcel
* bootstrap.conf (avoided_gnulib_modules):
Avoid mbchar, mbuiter, mbuiterf.
(gnulib_modules): Add mcel-prefer.
2023-09-09 18:54:20 -07:00
Paul Eggert
1dbdcdc4c8 build: update gnulib submodule to latest 2023-09-09 18:54:20 -07:00
Jim Meyering
180e8dd674 tests: actually package and run the new 100k-entries test
* tests/Makefile.am (TESTS): Include the new test file name,
100k-entries.
2023-08-20 12:42:14 -07:00
Paul Eggert
13fd8279e5 doc: clarify -- role
This should fix bug#65046 reported by Helmut Waltzmann.
2023-08-05 14:58:37 -07:00
Jim Meyering
d1c3fbe772 doc: mention the 100,000-entry ENOTSUP bug
* NEWS: document the fixed bug.
* tests/100k-entries: New file, to test for this.
Reported by Vincent Lefevre via Santiago Ruano Rincón in
https://bugs.gnu.org/64773
Fixed by gnulib commit v0.1-6175-gd4d8abb39e.
2023-07-21 17:42:23 -07:00
Paul Eggert
105e432d7f maint: link with mbrtc32-required libraries
Add libraries now suggested by gnulib-tool.
* src/Makefile.am (LDADD):
* tests/Makefile.am (LDADD):
Add $(HARD_LOCALE_LIB), $(LIBC32CONV), $(LIBSIGSEGV),
$(LIBUNISTRING), $(MBRTOWC_LIB), $(SETLOCALE_NULL_LIB).
* tests/Makefile.am (LDADD):
Also add $(LIBCSTACK), $(LIBICONV), $(LIBTHREAD).
2023-07-11 18:14:40 -07:00
Paul Eggert
9e915da342 maint: fix NEWS typo 2023-07-10 23:55:21 -07:00
Paul Eggert
975378294a maint: sync bootstrap from Gnulib 2023-07-10 23:53:58 -07:00
Paul Eggert
975ed119e9 maint: omit obsolescent Autoconf macros
* configure.ac: Don’t use the obsolescent macros AC_TYPE_SIZE_T,
AC_C_CONST, AC_HEADER_DIRENT, AC_FUNC_CLOSEDIR_VOID.

maint: remove isascii check
* configure.ac (isascii): Stop checking for this function,
as it’s not used.
2023-07-10 23:53:58 -07:00
Paul Eggert
7918c33702 grep: switch from wchar_t to char32_t
* bootstrap.conf (gnulib_modules): Add c32isalnum, c32rtomb,
mbrtoc32-regular.  Remove mbrtowc, wcrtomb, wctob, wctype-h.
wctob appears to be stray, as I don’t think was needed before
this change either.
* src/grep.c: Include uchar.h.
(setup_ok_fold, fgrep_icase_charlen):
Use char32_t, not wchar_t.
* src/search.h: Do not include wctype.h.
* src/searchutils.c: Include uchar.h.
(wordchar): Use c32isalnum, not iswalnum.
(wordchars_count): Use char32_t, not wchar_t.
2023-07-10 23:53:58 -07:00
Bruno Haible
ea3ec61613 build: update gnulib submodule to latest
* src/grep.c (setup_ok_fold, fgrep_icase_charlen):
Change the element type of the 'folded' array, to match the new
signature of case_folded_counterparts.
2023-07-10 23:53:58 -07:00
Carlo Marcelo Arenas Belón
481e6b4a3b grep: dynamically allocate buffer for -P version
* src/pcresearch.c (Pprint_version): Allocate version buffer
dynamically rather than aborting if a fixed-size buffer
is too small.
2023-07-10 23:53:58 -07:00
Bruno Haible
68c7d2f51c build: Ensure that makeinfo ≥ 6.8 checks the @menu structure.
See <https://lists.gnu.org/archive/html/bug-texinfo/2023-06/msg00015.html>.

* doc/Makefile.am (MAKEINFO): New variable.
* cfg.mk (_makefile_at_at_check_exceptions): New variable.
2023-06-24 18:40:08 -07:00
Jim Meyering
6980733869 build: update gnulib to latest 2023-06-24 17:28:15 -07:00
Jim Meyering
95553c0661 build: modernize bootstrap prerequsite tools
Following Pádraig Brady's example from coreutils, ...
* bootstrap.conf: Add an explicit requirement on m4.
Add an explicit requirement on texi2pdf -- often packaged separately
rather than with makeinfo -- its absence would otherwise induce a
failure late in the build process.
Replace the rsync dependency with wget,
which gnulib changed to in 2018.
Also, add an xz requirement and a version for autopoint.
2023-05-27 18:27:25 -07:00
Jim Meyering
d59cbb36b9 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2023-05-13 01:17:23 -07:00
Jim Meyering
f951840aa5 version 3.11
* NEWS: Record release date.
2023-05-13 01:13:14 -07:00
Josh Soref
16f9ca8ed1 doc: spelling fixes in doc/, comments and old ChangeLog
* ChangeLog-2009: Fix spelling errors.
* bootstrap: Likewise.
* doc/grep.texi: Likewise.
2023-05-10 13:18:57 -07:00
Jim Meyering
e43470dafc tests: reenable gnulib's strtoll and strtoull tests
* bootstrap.conf (avoided_gnulib_modules): Restore those tests.
The failures I saw must have been due to a stale config.cache.
2023-05-04 02:56:10 -07:00
Jim Meyering
2ea9219797 tests: temporarily omit gnulib's strtoll and strtoull tests
* bootstrap.conf (avoided_gnulib_modules): Omit strtoll and strtoull
tests, because the edge-case 0[bx] tests fail on recent systems.
2023-04-30 02:55:16 -07:00
Jim Meyering
c84a192000 build: update gnulib to latest 2023-04-30 00:01:41 -07:00
Carlo Marcelo Arenas Belón
fa4e6c8a77 pcre: work around a PCRE2_MATCH_INVALID_UTF bug
PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF: it would
sometimes fail to match patterns using negative classes
like \W and \D.

* NEWS (Bug fixes): Mention it.
* src/pcre2search.c: Restrict impact of the bug.
Do not use the problematic flag with broken versions of PCRE2.
Also, generate locale tables only for single-byte locales,
as the PCRE2 documentation recommends this.
* tests/Makefile.am (TESTS): Add the file name
* tests/pcre-utf8-bug224: New file, to test for this.
2023-04-30 00:01:41 -07:00
Paul Eggert
8d3afeebcc doc: improve doc for -P '\d'
This follows up to Carlo Marcelo Arenas Belón’s email
<https://lists.gnu.org/r/grep-devel/2023-04/msg00017.html>
that proposed changing the code too.  These patches change
only the documentation since we’re so near a release.
* NEWS: Be less optimistic about the fix for -P '\d',
and warn that behavior is likely to change again.
* doc/grep.texi (grep Programs): Be less specific about -P \d
behavior, since it’s still in flux.  Warn about mismatching
Unicode versions, or disagreements about obscure constructs.
2023-04-29 23:42:07 -07:00
Paul Eggert
c3259803fe build: support explicit ‘PCRE_CFLAGS= PCRE_LIBS=’
* m4/pcre.m4 (gl_FUNC_PCRE): Check whether PCRE_CFLAGS and
PCRE_LIBS are set, not whether they are set to a nonempty value.
2023-04-29 18:31:12 -07:00
Jim Meyering
7460d0f8b0 doc: say that -f - reads patterns from stdin
* doc/grep.texi (Matching Control): Mention that when -f's FILE is -,
grep reads patterns from stdin.
* doc/grep.in.1: Likewise.
* THANKS.in: Add the name.
Suggested by Sebastian Carlos in https://bugs.gnu.org/63146
2023-04-29 01:38:31 -07:00
Carlo Marcelo Arenas Belón
92585cde9b build: prevent pkg-config from overriding PCRE_* settings
The use of PCRE_CFLAGS and PCRE_LIBS, as documented in the output of
`--help`, is meant to override those settings from pkg-config.

* NEWS: mention this
* m4/pcre.m4: avoid overriding user provided settings
2023-04-22 21:29:57 -07:00
Jim Meyering
0f2c2c256f doc: note when a bug was introduced
* NEWS: say that the \d bug was introduced in 3.10.
2023-04-20 18:50:37 -07:00
Jim Meyering
3bcc2d8900 build: update gnulib to latest 2023-04-20 18:40:23 -07:00
Paul Eggert
6e7253de1d grep: make -P survive JIT compilation failure
* src/pcresearch.c (Pcompile): Ignore failure returns
from pcre2_jit_compile.
2023-04-13 14:53:33 -07:00
Paul Eggert
fd2d0f7165 grep: improve PCRE2 version output
* src/grep.c: No need to include pcre2.h.
(main) [HAVE_LIBPCRE]: Call Pprint_version instead of
doing it ourselves.
* src/pcresearch.c (Pprint_version): New function.
It also checks belatedly for buffer overflow, and
says "grep -P uses PCRE2" instead of "Built with PCRE".
* tests/version-pcre: Adjust test to match.
2023-04-10 16:16:08 -07:00
Jim Meyering
3b15d73897 tests: skip y2038 test upon touch setup failure
* tests/y2038-vs-32-bit: Skip rather than fail, when
the touch -t 2039... setup fails.  That command failed
on a solaris10 sparc build farm host.
2023-04-10 07:22:25 -07:00
Jim Meyering
e4983bd587 build: update gnulib to latest 2023-04-09 22:22:42 -07:00
Jim Meyering
85e0e4fdd4 tests: add a known-failing glibc-infloop test
* tests/glibc-infloop: New file.
Based on the command from Koen Claessen
reported in https://bugs.gnu.org/62483
* configure.ac (USE_INCLUDED_REGEX): define.
* tests/Makefile.am (TESTS): Add the file name
* THANKS.in: Add name of reporter.
2023-04-09 22:22:42 -07:00
Jim Meyering
88b2d37c0a grep: --version: print pcre version info
PCRE is integral to the functioning of grep's -P option, so it is in our
interest to make it easy to see which version of PCRE grep uses.
* src/grep.c [HAVE_LIBPCRE]: Include <pcre2.h>.
[HAVE_LIBPCRE] (main): Print pcre version info.
* tests/version-pcre: New test for this.
* tests/Makefile.am (TESTS): Add the file name.
* NEWS (Changes in behavior): Mention it.
2023-04-09 22:22:42 -07:00
Jim Meyering
19d2275fd1 tests: test for the year-2038 bug
* tests/y2038-vs-32-bit: New file.
* tests/Makefile.am (TESTS): Add the file name
2023-04-09 22:22:42 -07:00
Paul Eggert
488a115bfe grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM
The meaning of AC_SYS_LARGEFILE has changed to no longer even try
to use wider time_t if available.  So use AC_SYS_YEAR2038 as well.
A more-aggressive change would be to use the next Autoconf’s
AC_SYS_YEAR2038_REQUIRED but at least let’s restore the grep 3.8
behavior.
* NEWS: Mention this.
* bootstrap.conf: Add year2038.
2023-04-03 10:32:04 -07:00
Paul Eggert
c63a0950ff grep: fix -P [\d] by fixing \w only if PCRE2 10.43
Our prepass-based fixes for the -P \d bug have caused repeated
further bugs.  Avoid the need for a prepass, by using PCRE2_UCP
only if PCRE2_EXTRA_ASCII_BSD is also supported.  Since the -P \w
bug was present from grep 2.5 through 3.8 it’s OK if we wait a
little longer to fix it.
* NEWS: Mention this.
* src/pcresearch.c (pcre_pattern_expand_backslash_d}: Remove.
Remove its use.
(Pcompile): Use PCRE2_UCP only if PCRE2_EXTRA_ASCII_BSD.
* tests/pcre-ascii-digits, tests/pcre-utf8-w:
Skip tests on older PCRE2 implementations.
2023-04-02 09:47:16 -07:00
Jim Meyering
1d59f1b342 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2023-03-22 18:04:07 -07:00
Jim Meyering
0119aa8df1 version 3.10
* NEWS: Record release date.
2023-03-22 18:01:43 -07:00
Paul Eggert
0035fb36eb doc: avoid capital sharp S with TeX
Do not use “ẞ” (U+1E9E, LATIN CAPITAL LETTER SHARP S) in tex, as
texinfo version 2023-03-04.12 complains “Character missing, sorry:
LONG S.”
2023-03-22 17:43:09 -07:00
Jim Meyering
86d7b53af1 build: update gnulib to latest 2023-03-22 15:24:48 -07:00
Jim Meyering
30b80b654e build: update gnulib to latest 2023-03-20 17:08:11 -07:00
Paul Eggert
15f1f50e20 doc: clarify BRE vs ERE (bug#62272) 2023-03-20 00:24:10 -07:00
Jim Meyering
98ee05b4dd grep: -P (--perl-regexp) \D once again works like [^0-9]
* NEWS: Mention \D, too.
* doc/grep.texi: Likewise
* src/pcresearch.c (pcre_pattern_expand_backslash_d): Handle \D.
Also, ifdef-out this new function and its call site when not needed.
* tests/pcre-ascii-digits: Test \D, too.
Tighten one test by using returns_ 1.
Add comments and tests that work only with 10.43 and newer.
Paul Eggert raised the issue of \D in https://bugs.gnu.org/62267#8
2023-03-19 13:36:23 -07:00
Paul Eggert
99330c2b1d grep: forward port to PCRE2 10.43
* doc/grep.texi: Document this.
* src/grep.c: Move recent changes into pcresearch.c.
(P_MATCHER_INDEX): Remove.
(pcre_pattern_expand_backslash_d): Move from here ...
* src/pcresearch.c: ... to here.
(PCRE2_EXTRA_ASCII_BSD): Default to 0.
(Pcompile): Use PCRE2_EXTRA_ASCII_BSD if available,
and expand \d to [0-9] otherwise.
2023-03-19 08:43:01 -07:00
Paul Eggert
373b4434eb doc: distinguish Perl from PCRE
* doc/grep.texi: Mention that PCRE might not match Perl exactly.
2023-03-19 01:46:19 -07:00
Jim Meyering
c83ffc197e grep: -P (--perl-regexp) \d: match only ASCII digits
Prior to grep-3.9, the PCRE matcher had always treated \d just
like [0-9]. grep-3.9's fix for \w and \b mistakenly relaxed \d
to also match multibyte digits.
* src/grep.c (P_MATCHER_INDEX): Define enum.
(pcre_pattern_expand_backslash_d): New function.
(main): Call it for -P.
* NEWS (Bug fixes): Mention it.
* doc/grep.texi: Document it: with -P, \d matches only ASCII digits.
Provide a PCRE documentation URL and an example of how
to use (?s) with -z.
* tests/pcre-ascii-digits: New test.
* tests/Makefile.am (TESTS): Add that file name.
Reported as https://bugs.gnu.org/62267
2023-03-18 17:08:09 -07:00
Jim Meyering
7979ea7ddb build: update gnulib to latest 2023-03-18 14:02:25 -07:00
Jim Meyering
3dc94feb2e doc: remove mention of unused _N_GNU_nonoption_argv_flags_ envvar
* doc/grep.texi (Environment Variables): This environment variable
has not been usable for decades. Remove its documentation.
* doc/grep.in.1: Likewise.
Reported by Emanuele Torre torreemanuele6@gmail.com
in https://bugs.gnu.org/62052
* THANKS.in: Add the name.
2023-03-09 06:35:41 -08:00
Jim Meyering
9ef526a617 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2023-03-05 07:30:15 -08:00
Jim Meyering
e875939d61 version 3.9
* NEWS: Record release date.
2023-03-05 07:27:54 -08:00
Jim Meyering
50dfb382e9 build: update gnulib to latest 2023-03-05 07:19:02 -08:00
Bruno Haible
e00b27266a tests: avoid failure on Alpine Linux 3.17, due to non-POSIX compliant tr
* tests/fmbtest: Don't use [x*n] syntax in the tr options, since tr from
BusyBox 1.35 does not support it.
2023-03-05 07:10:37 -08:00
Jim Meyering
c8603c9faf build: update gnulib to latest 2023-02-26 09:50:08 -08:00
Jim Meyering
20f372417c build: avoid --enable-gcc-warnings clang-vs-sprintf build failure
* configure.ac (WERROR_CFLAGS): Disable -Wdeprecated-declarations
to accommodate Apple's clang 14 that's installed as "gcc".
2023-02-19 12:13:28 -08:00
Jim Meyering
9e4247e10d maint: remove stray character
* HACKING: Remove a stray "[" alone on a line.
2023-02-04 22:33:14 -08:00
Jim Meyering
f3f7e21274 maint: prefer https: to git:
The idea is to defend against some adversary-in-the-middle attacks.
Also prefer git.savannah.gnu.org over its shorter alias, git.sv.gnu.org
to avoid a warning e.g., from git clone.
Also, drop any final ".git" suffix on the resulting URIs.
Inspired by Paul Eggert's nearly identical changes to coreutils.
Induced by running these commands:
git grep -l 'git clone git:'|xargs perl -pi -e \
  's{(git clone) git://(\S+)/([^/]+)\b}{$1 https://$2/git/$3}'
git grep -l git.sv.gn \
  |xargs perl -pi -e 's{git\.sv\.gnu}{git\.savannah\.gnu}'
perl -pi -e \
  's{(url =) git://(\S+)/([^/.]+)(\.git)?\b}{$1 https://$2/git/$3}'\
  .gitmodules
* .gitmodules: As above.
* HACKING: Likewise.
* README-hacking: Likewise.
* src/grep.c (main): Likewise.
2023-02-04 22:33:14 -08:00
Bruno Haible
fab6358d5a Don't require 'rsync' as a prerequisite. It is no longer needed since 2018.
* bootstrap.conf (buildreq): Remove rsync.
* README-prereq: Likewise.
2023-01-30 15:21:55 -08:00
Paul Eggert
65751bd10d build: update gnulib submodule to latest 2023-01-22 01:08:02 -06:00
Paul Eggert
155cfb11e3 maint: stop including getprogname.h
It’s obsolete in bleeding-edge Gnulib.
* src/grep.c, tests/get-mb-cur-max.c: Don’t include getprogname.h.
Instead, rely on stdlib.h to declare getprogname.
2023-01-21 14:56:22 -06:00
Paul Eggert
21bfaa6ff6 build: update gnulib submodule to latest 2023-01-21 14:56:21 -06:00
Paul Eggert
819b3b176f build: update gnulib submodule to latest 2023-01-18 00:14:04 -08:00
Paul Eggert
516e855773 maint: spelling fixes 2023-01-16 01:19:41 -08:00
Paul Eggert
b63a992346 tests: fix test -eq problem
Do not use ‘test "" -eq 1’ when get-mb-cur-max fails,
as Bash complains about this.  Problem found on AIX.
2023-01-15 16:38:56 -08:00
Paul Eggert
d59fbb4146 build: update gnulib submodule to latest 2023-01-15 16:38:56 -08:00
Paul Eggert
4b60e9f353 tests: port U+10000+ to AIX 7.2
* tests/hangul-syllable, tests/surrogate-search:
32-bit AIX has WCHAR_MAX == 0xFFFF, and so cannot handle
U+10000 and greater.  Skip tests involving such chars
on this platform.
2023-01-15 16:38:56 -08:00
Paul Eggert
72ccd15d5c tests: update tests/init.sh
* tests/init.sh: Update from Gnulib.
2023-01-15 16:38:56 -08:00
Paul Eggert
eae77386eb grep: fix rawmemrchr etc. comments
* src/grep.c: Fix comments.
2023-01-15 16:38:56 -08:00
Paul Eggert
6de66dd6be tests: omit duplicate tests
* tests/skip-read: Omit duplicates.  Reported by Bruno Haible in:
https://lists.gnu.org/r/grep-devel/2023-01/msg00003.html
2023-01-14 02:06:25 -08:00
Paul Eggert
c9a77fa5bf build: update gnulib submodule to latest 2023-01-14 02:06:25 -08:00
Paul Eggert
231a3ea66d tests: better diagnostic for -P sans Unicode
* tests/init.cfg (require_pcre_): When in a UTF-8 locale, test
also for Unicode support so that it can be diagnosed differently
(Bug#60708).
2023-01-12 23:32:16 -08:00
Paul Eggert
3635121123 grep: diagnose no UTF-8 support (Bug#60708)
* src/pcresearch.c (Pcompile): Issue a diagnostic and exit instead
of misbehaving if libpcre2 does not support the requested locale.
2023-01-12 23:32:16 -08:00
Carlo Marcelo Arenas Belón
8f6a1e90e4 pcre: use UTF only when available in the library
Before this change, if linked with a PCRE library without unicode
any invocations of grep when using a UTF locale will error with:

  grep: this version of PCRE2 does not have Unicode support

* src/pcresearch.c: Check whether Unicode was compiled in.
* tests/pcre-utf8-w: Add check to skip test.
* tests/pcre-utf8: Update check.
2023-01-11 10:24:09 -08:00
Carlo Marcelo Arenas Belón
5e3b760f65 pcre: use UCP in UTF mode
This fixes a serious bug affecting word-boundary and word-constituent regular
expressions when the desired match involves non-ASCII UTF8 characters.
* src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF
* tests/pcre-utf8-w: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention this.
* THANKS.in: Add Gro-Tsen and Karl Petterson.
Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777
via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185
This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
2023-01-07 18:24:51 -08:00
Jim Meyering
45e1158a4b maint: update copyright dates 2023-01-01 20:36:23 -08:00
Jim Meyering
247e257563 build: update gnulib to latest 2023-01-01 20:36:23 -08:00
Jim Meyering
29c3f5b9df maint: avoid warnings about unportable grep -q
* cfg.mk (local-checks-to-skip): This is grep itself,
so using grep -q is not a problem here, as long as it
is running the just-built grep.
2023-01-01 20:36:23 -08:00
Jim Meyering
908f30573a maint: src/dfasearch.c: remove unnecessary re_set_syntax call
* src/dfasearch.c (GEAcompile): Don't call "re_set_syntax (syntax_bits)"
just before regex_compile; that function does the same thing already.
2022-12-10 17:24:22 -08:00
Paul Eggert
b061d24916 grep: bug: backref in last of multiple patterns
* NEWS: Mention this.
* src/dfasearch.c (GEAcompile): Trim trailing newline from
the last pattern, even if it has back-references and follows
a pattern that lacks back-references.
* tests/backref: Add test for this bug.
2022-12-05 14:17:29 -08:00
Paul Eggert
429b3497d1 maint: prefer stdckdint.h to intprops.h
Prefer the standard C23 ckd_* macros to Gnulib’s *_WRAPV macros.
* bootstrap.conf (gnulib_modules): Add stdckdint.
* src/grep.c, src/kwset.c, src/pcresearch.c:
Include stdckdint.h, and prefer ckd_* to *_WRAPV.
Include intprops.h only if needed.
2022-10-11 09:05:15 -07:00
Paul Eggert
1dd9bbe724 maint: add missing include
* src/pcresearch.c: Include intprops.h.
2022-10-11 09:05:15 -07:00
Paul Eggert
d14057a8b7 maint: prefer C23 style for static_assert
* bootstrap.conf (gnulib_modules): Add assert-h,
for static_assert.
* src/dfasearch.c (regex_compile): Prefer static_assert to verify.
2022-10-11 09:05:15 -07:00
Paul Eggert
1b1b496eb2 build: update gnulib submodule to latest 2022-10-11 09:05:15 -07:00
Paul Eggert
34ba125628 Assume C23-like bool
Gnulib’s stdbool module now provides C23-like semantics,
so there’s no longer any need to include stdbool.h.
* src/die.h, src/grep.h, src/kwset.h: Don’t include stdbool.h.
2022-09-10 23:04:25 -05:00
Paul Eggert
29bc7988c9 build: update gnulib submodule to latest 2022-09-10 23:04:25 -05:00
Paul Eggert
b3cd2ee4ae doc: improve GREP_COLORS doc (Bug#57696) 2022-09-09 13:13:16 -05:00
Paul Eggert
216f754287 Fix obsolescence doc for egrep, fgrep 2022-09-06 14:04:09 -05:00
Jim Meyering
65e303a17e maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2022-09-02 23:54:04 -07:00
Jim Meyering
958bcc3ada version 3.8
* NEWS: Record release date.
2022-09-02 23:51:55 -07:00
Jim Meyering
dc00df27cb build: update gnulib to latest 2022-07-10 20:56:48 -07:00
Jim Meyering
0acc194ae1 tests: long-pattern-perf: avoid FP failure on unusual systems
* tests/long-pattern-perf: Skip this test whenever the base
case takes more than 800ms.  See comment for details.
Reported by Bruno Haible in
https://lists.gnu.org/r/grep-devel/2022-07/msg00004.html
https://lists.gnu.org/r/grep-devel/2022-07/msg00006.html
2022-07-09 20:46:45 -07:00
Jim Meyering
aa8ca91c08 tests: long-pattern-perf: better handle exhausted virtual memory
* tests/long-pattern-perf: Don't fail due to a syntax error
when one of the subtests exhausts virtual memory. The larger
test (with a 2MiB regexp) needs about 870MiB of virtual memory.
Require that each timing run exit with status 0, else fail with
a framework_failure_. Reported by Bruno Haible in
https://lists.gnu.org/r/grep-devel/2022-07/msg00006.html
2022-07-03 11:46:09 -07:00
Jim Meyering
b47a3fb155 tests: note that triple-backref is still not fixed
* tests/triple-backref: I noticed that our sole XFAIL is still
required, in spite of a glibc comment that bug 11053 is fixed,
so confirmed that it no longer evokes an abort, but still fails
to produce the expected match.  I.e., this prints nothing:
echo a | grep -E '(.?)(.?)(.?)\3\2\1' -- it should print its input.
2022-07-03 10:21:49 -07:00
Jim Meyering
90bc5d93f1 tests: do not emit ratio of test durations
* tests/hash-collision-perf (ratio): Remove stray diagnostic.
2022-07-03 10:01:04 -07:00
Jim Meyering
c9ac429ddd build: update gnulib to latest 2022-06-30 16:42:25 -07:00
Jim Meyering
565678570c maint: remove reference to gnulib module, alloca
* bootstrap.conf (gnulib_modules): Remove alloca; we do not
use it directly.
2022-06-30 16:42:18 -07:00
Jim Meyering
98376d7988 build: add parentheses to placate clang-14
* src/dfasearch.c (regex_compile): Parenthesize to avoid
this warning:
  dfasearch.c:154:43: error: operator '?:' has lower precedence
  than '|'; '|' will be evaluated first
  [-Werror,-Wbitwise-conditional-parentheses]
2022-06-29 12:00:36 -07:00
Paul Eggert
e2aec8c91e grep: fix regex compilation memory leaks
Problem reported by Jim Meyering in:
https://lists.gnu.org/r/grep-devel/2022-06/msg00012.html
* src/dfasearch.c (regex_compile): Fix memory leaks when SYNTAX_ONLY.
2022-06-24 18:36:53 -05:00
Jim Meyering
225d921887 build: update gnulib to latest 2022-06-08 12:07:22 -07:00
Jim Meyering
c73b86f757 maint: stop using obsolete iswctype gnulib module
* bootstrap.conf (gnulib_modules): Remove iswctype, an unused and obsolete module.
2022-06-08 12:07:22 -07:00
Paul Eggert
6f52ef30e5 grep: don’t diagnose "grep '\-c'"
* src/grep.c (main): Skip past leading backslash of a pattern that
begins with "\-".  Inspired by a remark by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2022-06/msg00022.html
2022-06-06 14:51:47 -07:00
Paul Eggert
739892e8d4 doc: document \] and \}
* doc/grep.texi (Special Backslash Expressions)
(Problematic Expressions): Document that grep supports
\] and \} as extensions to POSIX.
2022-06-04 10:27:50 -07:00
Paul Eggert
3b66aaf50a build: update gnulib submodule to latest 2022-06-04 10:27:50 -07:00
Paul Eggert
16469277b3 build: update bootstrap to gnulib latest 2022-06-03 18:25:10 -07:00
Paul Eggert
1e517bf6a2 build: update gnulib submodule to latest 2022-06-03 18:25:09 -07:00
Paul Eggert
0942f31bd8 maint: spelling fixes 2022-06-03 13:07:07 -07:00
Paul Eggert
5e3d207d5b grep: sanity-check GREP_COLOR
This patch closes a longstanding security issue with GREP_COLOR that I
just noticed, where if the attacker has control over GREP_COLOR's
settings the attacker can trash the victim's terminal or have 'grep'
generate misleading output.  For example, without the patch
the shell command:
GREP_COLOR="$(printf '31m\33[2J\33[31')" grep --color=always PATTERN
mucks with the screen, leaving behind only the trailing part of
the last matching line.  With the patch, this GREP_COLOR is ignored.
* src/grep.c (main): Sanity-check GREP_COLOR contents the same way
GREP_COLORS values are checked, to not trash the user's terminal.
This follows up the recent fix to Bug#55641.
2022-05-31 18:13:34 -07:00
Jim Meyering
d922927049 tests: placate syntax-check's sorted-test rule
* tests/Makefile.am (TESTS): Insert color-colors in sorted order.
2022-05-29 22:49:03 -07:00
Jim Meyering
c8d89e8c34 maint: include fdl.texi in version control, per gnulib module advice
* bootstrap.conf (gnulib_modules): Remove fdl.
* doc/.gitignore: Do not list fdl.texi
* doc/fdl.texi: New file.
* cfg.mk (FILTER_LONG_LINES): Add doc/fdl.texi.
2022-05-29 22:49:03 -07:00
Jim Meyering
656de767ee maint: po/POTFILES.in: add src/dfasearch.c to avoid syntax-check failure
* po/POTFILES.in: Add src/dfasearch.c.
2022-05-29 22:48:09 -07:00
Jim Meyering
5b98e7b7c2 build: sync init.sh from gnulib 2022-05-29 18:56:22 -07:00
Paul Eggert
2ff819e750 grep: document --color[=WHEN] more carefully 2022-05-29 17:09:31 -07:00
Paul Eggert
d85711f694 tests: new test color-colors
* tests/Makefile.am (TESTS): Add it.
* tests/color-colors: New file.
2022-05-29 16:42:43 -07:00
Paul Eggert
4ac5fa8959 grep: deprecate GREP_COLOR
This is to avoid confusion such as that reported by Cholden in:
https://bugs.gnu.org/55641
* src/grep.c (main): Warn if GREP_COLOR has an effect.
2022-05-29 16:42:43 -07:00
Paul Eggert
da07083481 maint: fix typo in bug number 2022-05-24 19:08:10 -07:00
Paul Eggert
1546617435 grep: warn about ‘(+x)’ etc.
These expressions are not portable and don’t always work as
expected, so warn about them.  For example, “grep -E '(+)'”
doesn’t act like “grep '\(\+\)'”.
* src/dfasearch.c (GEAcompile): Warn about a repetition op at the
start of a regular expression or subexpression, except for ‘*’ in
BREs which is portable.
2022-05-24 17:46:39 -07:00
Paul Eggert
8e0c90966d build: update gnulib submodule to latest 2022-05-24 17:46:39 -07:00
Paul Eggert
e7f8e8eb1f grep: warn about stray backslashes
This papers over a problem reported by Benno Schulenberg and
Tomasz Dziendzielski <https://bugs.gnu.org/39678> involving
regular expressions like \a that have unspecified behavior.
* src/dfasearch.c (dfawarn): Just output a warning.
Don’t exit, as DFA_CONFUSING_BRACKETS_ERROR now
does that for us, and we need the ability to warn
without exiting to diagnose \a etc.
(GEAcompile): Use new dfa options DFA_CONFUSING_BRACKETS_ERROR and
DFA_STRAY_BACKSLASH_WARN.
2022-05-23 12:40:26 -07:00
Paul Eggert
42db5cc8f5 build: update gnulib submodule to latest 2022-05-23 12:40:26 -07:00
Paul Eggert
a860bd39e3 doc: document regex corner cases better
* doc/grep.texi (Environment Variables)
(Fundamental Structure, Character Classes and Bracket Expressions)
(Special Backslash Expressions, Back-references and Subexpressions)
(Basic vs Extended): Say more precisely what happens with
problematic regular expressions.
(Problematic Expressions): New section.
2022-05-22 15:01:32 -07:00
Paul Eggert
80bcb074ae tests: port to platforms lacking Perl
* tests/init.cfg (require_perl_): New function.
* tests/big-hole, tests/hash-collision-perf, tests/long-pattern-perf:
* tests/many-regex-performance, tests/mb-non-UTF8-performance:
Use it.
2022-05-21 18:43:42 -07:00
Paul Eggert
d6276889a0 build: be more careful about Perl
Problem reported by Serge Belyshev for Coreutils (Bug#52844).
I observed the same problem with current Grep on Fedora 35
without Perl installed.
* configure.ac (HAVE_PERL): Rely on latest Gnulib gl_PERL, which
sets gl_cv_prog_perl.
2022-05-21 18:43:41 -07:00
Paul Eggert
c831ffa1d9 doc: document regex corner cases better
* doc/grep.texi (Environment Variables)
(Fundamental Structure, Character Classes and Bracket Expressions)
(The Backslash Character and Special Expressions)
(Back-references and Subexpressions, Basic vs Extended)
(Basic vs Extended): Say more precisely what happens with oddball
regular expressions.
2022-05-21 02:41:20 -07:00
Paul Eggert
a368a60eb8 grep: assume POSIX.1-2017 for [:space:]
* src/dfasearch.c (dfawarn): Always call dfaerror now,
regardless of POSIXLY_CORRECT.
* tests/warn-char-classes: Omit test of POSIX.1-2008 behavior,
since POSIX.1-2017 allows the GNU behavior.
2022-05-21 02:41:20 -07:00
Paul Eggert
2169fa36c9 tests: make spencer1.tests more POSIX-compliant
* tests/spencer1.tests: Do not test the regular expression a\x as
POSIX says the interpretation of \x is undefined and we may want
to warn about it in the future, to allow for future extensions.
Instead, test a\\x, a[\]x, and ax.
2022-05-21 02:41:20 -07:00
Paul Eggert
e24ab83682 doc: omit -y from grep man page
The obsolete -y option has been omitted from --help for a while, and
now’s a good time to omit it from the man page too.
2022-05-19 16:36:52 -07:00
Paul Eggert
e4a71086bf tests: improve tests of ‘.’
* tests/hangul-syllable: Test some encoding errors too.
2022-05-17 13:49:35 -07:00
Paul Eggert
a7c8349894 grep: document -m better
* doc/grep.in.1, doc/grep.texi: Document behavior of -m 0 and -m -1.
This documents longstanding behavior, and is consistent with
how git grep -m will likely behave.
2022-05-16 12:19:02 -07:00
Paul Eggert
078987db6d maint: spelling fixes 2022-05-14 15:11:38 -07:00
Paul Eggert
5447010fdb grep: fix bug with . and some Hangul Syllables
* NEWS: Mention the fix, which comes from the recent Gnulib update.
* tests/hangul-syllable: New file.
* tests/Makefile.am (TESTS): Add it.
2022-05-13 23:48:18 -07:00
Paul Eggert
ef6c7768b3 build: update gnulib submodule to latest 2022-05-13 23:48:18 -07:00
Jim Meyering
561cf64e4a build: update gnulib to latest, for glob improvements 2022-04-02 11:43:55 -07:00
Paul Eggert
743b1f6f5c grep: Remove recent PCRE2 bug workarounds
* src/pcresearch.c (Pcompile): Remove recent workaround for PCRE2
bugs; apparently it’s not needed.  This reverts back to where
things were before today.  Suggested by Carlo Arenas in:
https://lists.gnu.org/r/grep-devel/2022-03/msg00006.html
2022-03-22 20:13:13 -07:00
Paul Eggert
dfcd2c9cc8 grep: work around another potential PCRE2 bug
Potential problem reported by René Scharfe in:
https://lore.kernel.org/git/99b0adb6-26ba-293c-3a8f-679f59e7cb4d@web.de/T
* src/pcresearch.c (Pcompile): Mimic git grep’s workarounds
for PCRE2 bugs more closely; this is more conservative.
2022-03-22 14:09:45 -07:00
Paul Eggert
0ca5dcc1c5 grep: work around PCRE2 bug 2642
Problem reported by Carlo Arenas in:
https://lists.gnu.org/r/grep-devel/2022-03/msg00004.html
* src/pcresearch.c (Pcompile) [PCRE2_MATCH_INVALID_UTF]:
In PCRE2 10.35 and earlier, disable start optimization if doing a
caseless UTF-8 search.
2022-03-22 09:18:09 -07:00
Paul Eggert
70fc166b38 Pacify GCC 11.2.0
* configure.ac: Re-disable -Wstack-protector, to pacify GCC Ubuntu
11.2.0-7ubuntu2 x86-64 on knuth_morris_pratt and
knuth_morris_pratt_multibyte.
2022-03-22 08:19:16 -07:00
Jim Meyering
bc4241629c build: update gnulib to latest, for bootstrap long-line fix 2022-03-20 18:40:41 -07:00
Jim Meyering
16b3c2f9f3 build: Re-disable -Winline
* configure.ac: Re-disable -Winline. It is still needed.
2022-03-20 13:37:44 -07:00
Jim Meyering
269795f3b8 build: update gnulib to latest; also bootstrap and init.sh 2022-03-20 13:30:38 -07:00
Jim Meyering
9af9d51605 build: avoid build failure on systems that must compile regexec.c
With --enable-gcc-warnings, compiling regexec.h would fail due to
its use of a single variable-length array.
* configure.ac: Add -Wvla to the list of disabled warnings and
remove most of the others, that no longer need to be disabled.
2022-03-20 13:23:58 -07:00
Jim Meyering
6e95551ad6 grep: very long lines no longer evoke unwarranted "memory exhausted"
When calling xpalloc (NULL, &n, incr_min, alloc_max, 1) with
nontrivial ALLOC_MAX, this must hold: N + INCR_MIN <= ALLOC_MAX.
With a very long line, it did not, and grep would mistakenly fail
with a report of "memory exhausted".
* src/grep.c (fillbuf): When using nontrivial ALLOC_MAX, ensure it
is at least N+INCR_MIN.
* tests/fillbuf-long-line: New file, to test for this.
* tests/Makefile.am (TESTS): Add its name.
2022-03-20 13:23:58 -07:00
Paul Eggert
efe1e1543c doc: more on leading ‘-’
* doc/grep.texi (Usage): Expand on leading ‘-’ problems (Bug#54174).
2022-02-27 15:17:47 -08:00
Paul Eggert
1580562d51 doc: mention issues with set -e
* doc/grep.texi (Usage, Performance): Mention early exits (Bug#54035).
2022-02-24 11:07:58 -08:00
Ulrich Eckhardt
c128fa57c6 grep: Remove comment
The comment was introduced in 500f07fee50ab16a70fe2946b85318020c7f4017 and
relates to absent cleanup code at the end of main(), not the code following
it. It relates to fallible flushing of stdout and related error handling,
but even then it doesn't explain much.
Copyright-paperwork-exempt: yes
2022-02-15 16:59:38 -08:00
Ulrich Eckhardt
5c3c427988 tests: remove redundant test
* tests/empty: Test #4 is identical to test #1. Remove it.
2022-02-15 09:36:18 -08:00
Ondřej Fiala
f31ae6d46d bug#52958: [PATCH] doc: fix man page syntax errors
* doc/grep.in.1: Fix syntax errors.
Introduced by commit v3.6-5-g91ce9cd.
Copyright-paperwork-exempt: Yes
2022-01-02 11:14:21 -08:00
Jim Meyering
f9290127f3 maint: make update-copyright 2022-01-01 10:29:09 -08:00
Jim Meyering
1843c0b0c8 build: update gnulib to latest; also bootstrap and init.sh 2022-01-01 10:26:11 -08:00
Jim Meyering
d30721074f build: disable some expensive compiler warnings by default
* configure.ac (gl_GCC_VERSION_IFELSE): Copy from coreutils.
(gcc-warnings): Update from coreutils.
2021-12-26 09:49:16 -08:00
Jim Meyering
abf2fa8efa maint: avoid new syntax-check failures
* cfg.mk (local-checks-to-skip): Add sc_indent, to skip it.
Otherwise, "make syntax-check" would fail.
(_gl_TS_unmarked_extern_functions): Add imbrlen to the list.
2021-12-24 13:40:34 -08:00
Helge Kreutzmann
95440891d0 doc: --invert-match is described "above" --count, not below
* doc/grep.in.1 (--count): s/below/above/.
2021-12-20 10:25:40 -08:00
Paul Eggert
fa48acda06 build: update gnulib submodule to latest 2021-11-24 09:40:59 -08:00
Paul Eggert
4396e12b8c tests: skip surrogate-search test on Cygwin
Cygwin does not support surrogate-pair search strings, so
skip the test there (Bug#27555).
* tests/Makefile.am (TESTS): Add surrogate-search.
* tests/surrogate-pair: Remove surrogate-search test,
which is now done by surrogate-search.
* tests/surrogate-search: New test, which is skipped on Cygwin.
2021-11-23 18:32:36 -08:00
Paul Eggert
4406a0d28f doc: update to match recent "Binary files" change
Suggested by Duncan Roe (Bug#51860#25).
2021-11-22 18:15:42 -08:00
Paul Eggert
af79b17356 grep: -s does not suppress “binary file matches”
* src/grep.c (grep): Implement this.
* tests/binary-file-matches: Add regression test.
2021-11-20 22:55:53 -08:00
Paul Eggert
56762bfda5 doc: "binary file matches" -> stderr [Bug#51860] 2021-11-20 22:55:53 -08:00
Paul Eggert
7651f7b832 grep: port to PCRE2 10.20
* src/pcresearch.c (PCRE2_SIZE_MAX): Default to SIZE_MAX.
2021-11-14 12:13:28 -08:00
Paul Eggert
ad6e5cbcf5 grep: fix minor -P memory leak
* src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
2021-11-14 12:13:28 -08:00
Paul Eggert
99fcca954f grep: use ximalloc, not xcalloc
* src/pcresearch.c (Pcompile): Use ximalloc, not xcalloc,
and explicitly initialize the two slots that should be null.
This is more likely to catch future errors if we use valgrind.
2021-11-14 12:13:28 -08:00
Paul Eggert
a0feba0a48 grep: improve memory exhaustion checking with -P
* src/pcresearch.c (struct pcre_comp): New member gcontext.
(private_malloc, private_free): New functions.
(jit_exec): It is OK to call pcre2_jit_stack_free (NULL), so simplify.
Use gcontext for allocation.  Check for pcre2_jit_stack_create
failure, since sljit bypasses private_malloc.  Redo to avoid two
‘continue’s.
(Pcompile): Create and use gcontext.
2021-11-14 12:13:28 -08:00
Paul Eggert
ae9780c06b grep: simplify JIT setup
* src/pcresearch.c (Pcompile): Simplify since ‘die’ cannot return.
2021-11-14 12:13:28 -08:00
Paul Eggert
50d5fbb7c3 grep: use PCRE2_EXTRA_MATCH_LINE
* src/pcresearch.c (Pcompile): If available, use
PCRE2_EXTRA_MATCH_LINE instead of doing it by hand.
Simplify construction of substitute regular expression.
2021-11-14 12:13:28 -08:00
Paul Eggert
c6283a2c92 grep: prefer signed integers
* src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute):
Prefer signed to unsigned types when either will do.
(jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand.
(Pexecute): Omit line length limit test that is no longer
needed with PCRE2.
2021-11-14 12:13:28 -08:00
Paul Eggert
6e1450408a grep: speed up, fix bad-UTF8 check with -P
* src/pcresearch.c (bad_utf8_from_pcre2): New function.  Fix bug
where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error.
Improve performance when PCRE2_MATCH_INVALID_UTF is defined.
(Pexecute): Use it.
2021-11-14 12:13:28 -08:00
Paul Eggert
3935b2a4f6 grep: improve pcre2_get_error_message comments
* src/pcresearch.c (Pcompile): Improve comments re
pcre2_get_error_message buffer.
2021-11-14 12:13:28 -08:00
Paul Eggert
6f84f3be1c grep: Don’t limit jitstack_max to INT_MAX
* src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT
stack size.
2021-11-14 12:13:28 -08:00
Paul Eggert
e1394a6408 maint: minor rewording and reindenting 2021-11-14 12:13:28 -08:00
Carlo Marcelo Arenas Belón
e0d39a9133 grep: migrate to pcre2
Mostly a bug by bug translation of the original code to the PCRE2 API.
Code still could do with some optimizations but should be good as a
starting point.

The API changes the sign of some types and therefore some ugly casts
were needed, some of the changes are just to make sure all variables
fit into the newer types better.

Includes backward compatibility and could be made to build all the way
to 10.00, but assumes a recent enough version and has been tested with
10.23 (from CentOS 7, the oldest).

Performance seems equivalent, and it also seems functionally complete.

* m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE.
* src/pcresearch.c (struct pcre_comp, jit_exec)
(Pcompile, Pexecute):
Use PCRE2, not the original PCRE.
* tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
2021-11-14 12:13:28 -08:00
Paul Eggert
b07c82ccdb maint: update README-prereq for Gperf, Rsync, Wget 2021-11-11 18:14:33 -08:00
Paul Eggert
015d028d05 tests: fix pcre test typo
* tests/pcre-context: Initialize ‘fail’ earlier.
2021-11-10 18:24:17 -08:00
Carlo Marcelo Arenas Belón
f585b6bb3b tests: fix test logic for pcre-context
Included in the original bug #20957, but corrupted somehow in
transit as the required NUL characters are missing.

Add a simpler version of the test case that uses plain characters
and match the -z data and output to show the equivalence.

Note the output is still not correct as it is missing the expected
LF characters, but a full fix will have to wait until PCRE2.

Fixes Bug#51735.
2021-11-10 18:24:17 -08:00
Paul Eggert
b3a85a1a8a grep: work around PCRE bug
Problem reported by Carlo Marcelo Arenas Belón (Bug#51710).
* src/pcresearch.c (jit_exec): Don’t attempt to grow the JIT stack
over INT_MAX - 8 * 1024.
2021-11-09 10:12:23 -08:00
Paul Eggert
c562691787 build: update gnulib submodule to latest 2021-11-09 10:12:23 -08:00
Paul Eggert
1ba972edec maint: modernize README-{hacking,prereq} 2021-10-30 16:28:25 -07:00
Paul Eggert
b3d082ce04 build: update gnulib submodule to latest 2021-10-28 10:46:43 -07:00
Paul Eggert
f0d97db2a2 doc: document interval expression limitations
* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
2021-08-27 18:21:24 -07:00
Paul Eggert
fd72f5d2c2 build: update gnulib submodule to latest
* src/system.h: Update decls to match current Gnulib.
2021-08-27 18:21:24 -07:00
Paul Eggert
e3694e90b4 grep: prefer signed to unsigned integers
This improves runtime checking for integer overflow when compiling
with gcc -fsanitize=undefined and the like.  It also avoids
the need for some integer casts, which can be error-prone.
* bootstrap.conf (gnulib_modules): Add idx.
* src/dfasearch.c (struct dfa_comp, kwsmusts):
(possible_backrefs_in_pattern, regex_compile, GEAcompile)
(EGexecute):
* src/grep.c (struct patloc, patlocs_allocated, patlocs_used)
(n_patterns, update_patterns, pattern_file_name, poison_len)
(asan_poison, fwrite_errno, compile_fp_t, execute_fp_t)
(buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls)
(bufalloc, pagesize, all_zeros, fillbuf, nlscan)
(print_line_head, print_line_middle, print_line_tail, grepbuf)
(grep, contains_encoding_error, fgrep_icase_available)
(fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern)
(main):
* src/kwsearch.c (struct kwsearch, Fcompile, Fexecute):
* src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr)
(kwswords, treefails, memchr_kwset, acexec_trans, kwsexec)
(treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec)
(acexec):
* src/kwset.h (struct kwsmatch):
* src/pcresearch.c (Pcompile, Pexecute):
* src/search.h (mb_clen):
* src/searchutils.c (kwsinit, mb_goback, wordchars_count)
(wordchars_size, wordchar_next, wordchar_prev):
Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes,
and prefer ptrdiff_t to size_t for sizes plus error values.
* src/grep.c (uword_size): New constant, used for signed
size calculations.
(totalnl, add_count, totalcc, print_offset, print_line_head, grep):
Prefer intmax_t to uintmax_t for wide integer calculations.
(fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets.
* src/grep.h: Include idx.h.
* src/search.h (imbrlen): New function, like mbrlen except
with idx_t and ptrdiff_t.
2021-08-25 12:11:27 -07:00
Paul Eggert
b7d83f46d8 grep: scan back thru UTF-8 a bit faster
* src/searchutils.c (mb_goback): When scanning backward through
UTF-8, check the length implied by the putative byte 1 before
bothering to invoke mb_clen.  This length check also lets us use
mbrlen directly rather than calling mb_clen, which would
eventually defer to mbrlen anyway.
2021-08-24 00:43:28 -07:00
Paul Eggert
643e557388 grep: tweak mb_goback performance
* src/searchutils.c (mb_goback): Set *MBCLEN only in
non-UTF-8 encodings, since that’s the only time it’s needed,
and this lets us see more clearly that the UTF-8 clen value
is not useful to the caller.
2021-08-24 00:43:28 -07:00
Paul Eggert
869989fa83 grep: tweak wordchar_prev performance
* src/searchutils.c (wordchar_prev): Tweak performance by using a
value already in a local variable rather than consulting a table.
2021-08-24 00:43:28 -07:00
Paul Eggert
70b84b9294 grep: tweak mb_goback and comment it better
* src/searchutils.c (mb_goback): Improve the comment to better
describe this confusing function.  And remove an unnecessary
test of cur vs end.
2021-08-24 00:43:27 -07:00
Paul Eggert
01b7b13f83 grep: omit unused maxd member
* src/kwset.c (struct kwset.maxd): Remove.  All uses removed.
2021-08-24 00:43:27 -07:00
Paul Eggert
2b455da03f grep: avoid some size_t casts
This helps move the code away from unsigned types.
* src/grep.c (buf_has_encoding_errors, contains_encoding_error):
* src/searchutils.c (mb_goback):
Compare to MB_LEN_MAX, not to (size_t) -2.  This is a bit safer
anyway, as grep relies on MB_LEN_MAX limits elsewhere.
* src/search.h (mb_clen): Compare to -2 before converting to size_t.
2021-08-24 00:43:27 -07:00
Jim Meyering
33b2d2eded tests: mb-non-UTF8-perf-Fw: use head rather than sed
* tests/mb-non-UTF8-perf-Fw: Use head -n 10000000 rather than the
work-alike sed command.  This provides a 4x speedup and saves 0.5s.
* tests/null-byte: Likewise.
2021-08-22 09:39:47 +02:00
Paul Eggert
ad6de316cc grep: avoid sticky problem with ‘-f - -f -’
Inspired by bug#50129 even though this is a different bug.
* src/grep.c (main): For ‘-f -’, use clearerr (stdin) after
reading, so that ‘grep -f - -f -’ reads stdin twice even
when stdin is a tty.  Also, for ‘-f FILE’, report any
I/O error when closing FILE.
2021-08-21 10:45:06 -07:00
Paul Eggert
9f296c1238 tests: port mb-non-UTF8-perf-Fw to strict POSIX
* tests/mb-non-UTF8-perf-Fw: Prefer ‘sed 10q’ to ‘head -10’,
which doesn’t conform to POSIX.
2021-08-18 19:05:01 -07:00
Paul Eggert
0687c51c47 grep: djb2 correction
Problem reported by Alex Murray (bug#50093).
* src/grep.c (hash_pattern): Use a nonzero initial value.
2021-08-18 07:37:27 -07:00
Paul Eggert
f3da64c603 doc: modernize portability advice
* doc/grep.texi (General Output Control, Basic vs Extended):
No need to complicate the portability advice by talking about 7th
edition grep, since it’s no longer a practical porting target.
Instead, mention only Solaris 10 grep, the last practical holdout
of somewhat-traditional grep.
2021-08-16 18:42:31 -07:00
Paul Eggert
a951562470 egrep, fgrep: now obsolete
* NEWS: Mention this (see bug#49996).
* doc/Makefile.am (egrep.1 fgrep.1): Remove.  All uses removed.
* doc/grep.in.1, doc/grep.texi (grep Programs):
Remove documentation for egrep, fgrep.
* doc/grep.texi (Usage): Add FAQ for egrep and fgrep.
* src/Makefile.am (shell_does_substrings): Substitute for ${0##*/},
not for ${0%/\*} (which was not being used anyway).
* src/egrep.sh: Issue an obsolescence warning.
* tests/fedora: Use "grep -F" instead of "fgrep" in diagnostics,
as this tests "grep -F" not "fgrep".
2021-08-16 10:12:29 -07:00
Paul Eggert
e87ccc7038 doc: update cites and authors 2021-08-14 14:19:55 -07:00
Jim Meyering
8b2c31b646 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2021-08-14 12:55:38 -07:00
Jim Meyering
df49b0c1dd version 3.7
* NEWS: Record release date.
2021-08-14 12:52:55 -07:00
Jim Meyering
6aaef3b8a7 tests: provide an awk-based seq replacement
...so we can continue to use seq, but the wrapper when needed.
* tests/init.cfg (seq): Some systems lask seq.
Provide a replacement.
* tests/hash-collision-perf: Use seq once again.
* tests/long-pattern-perf: Likewise. And remove a comment about seq.
2021-08-09 17:56:31 -07:00
Paul Eggert
2192cabb7a grep: simplify EGexecute
* src/dfasearch.c (EGexecute): Remove a label and goto.
This also makes the machine code a bit shorter, on x86-64 gcc.
2021-08-09 02:30:13 -07:00
Paul Eggert
84a183ef2e grep: simplify data movement slightly
* src/grep.c (fillbuf): Simplify movement of saved data.
2021-08-09 02:30:13 -07:00
Paul Eggert
dadc9f49c0 grep: pointer-integer cast nit
* src/grep.c (ALIGN_TO): When converting pointers to unsigned
integers, convert to uintptr_t not size_t, as size_t in theory
might be too narrow.
2021-08-09 02:30:13 -07:00
Paul Eggert
5c628c7074 tests: use awk, not seq
Portability problem reported by Dagobert Michelsen in:
https://lists.gnu.org/r/grep-devel/2021-08/msg00004.html
* tests/hash-collision-perf, tests/long-pattern-perf:
Don’t assume seq is installed; use awk instead.
2021-08-09 02:05:55 -07:00
Jim Meyering
20b408ac08 build: update gnulib to latest 2021-08-08 09:55:53 -07:00
Jim Meyering
c684754907 build: update gnulib to latest 2021-08-08 08:56:56 -07:00
Kevin Locke
1a0f4d4c49 doc: usage: --group-separator/--no-group-separator
* src/grep.c (usage): Document --group-separator
and --no-group-separator.
2021-08-06 22:18:13 -07:00
Kevin Locke
1a8c7e0a8e doc: man: add --group-separator/--no-group-separator
* doc/grep.in.1:
Add copy of docs for --group-separator from doc/grep.texi.
Add copy of docs for --no-group-separator from doc/grep.texi.
2021-08-06 22:15:26 -07:00
Jim Meyering
743be2c498 build: update gnulib to latest 2021-08-06 21:50:59 -07:00
Mateusz Okulus
cb15dfa4b2 doc: note that -H is a GNU extension in man page, too
* doc/grep.in.1 (-H): Mention that this is a GNU extension.
2021-06-19 15:22:18 -07:00
Paul Eggert
44965de327 build: update gnulib submodule to latest 2021-06-13 17:28:59 -07:00
Paul Eggert
a35c7f0455 build: update gnulib submodule to latest 2021-06-11 17:52:19 -07:00
Paul Eggert
e6571dfd45 doc: improve examples and wording
* doc/grep.texi (The Backslash Character and Special Expressions)
(Usage): Improve doc (Bug#48948).
2021-06-10 14:55:43 -07:00
Jim Meyering
70517057c9 doc: man: fix -L description and improve -l's
* doc/grep.texi (-L): Remove erroneous sentence about stopping early.
With -L, grep cannot stop scanning early.
(-l): Tweak existing wording.
* doc/grep.in.1: Remove the -L sentence here, too.
(-l): Copy the sentence from grep.texi, to clarify: it's only per-file
scanning that stops upon match.  Reported by Robert Bruntz
in http://debbugs.gnu.org/46179
2021-01-31 14:09:46 -08:00
Jim Meyering
74cda437ff build: avoid long-string warnings in gnulib tests
* configure.ac (GNULIB_TEST_WARN_CFLAGS): Add
-Woverlength-strings to avoid clang warnings.
2021-01-05 20:03:58 -08:00
Paul Eggert
b216515f9c doc: further clarify regexp structure
* doc/grep.texi (Fundamental Structure)
(Back-references and Subexpressions, Basic vs Extended):
Further clarifications.
2021-01-01 19:00:09 -08:00
Paul Eggert
6b454dc20d maint: copy bootstrap, tests/init.sh from Gnulib 2021-01-01 19:00:09 -08:00
Paul Eggert
bcf2659345 doc: update grep.texi cite to 2021 2021-01-01 19:00:09 -08:00
Paul Eggert
e84a2ea9d2 maint: run "make update-copyright" 2021-01-01 19:00:09 -08:00
Paul Eggert
ebbb25f6ef build: update gnulib submodule to latest 2021-01-01 19:00:07 -08:00
Jim Meyering
51452f79ce build: update gnulib to latest
* gnulib: update for clang-10 warning warning-avoidance
fixes in hash and regex-tests.
2020-12-30 08:23:30 -08:00
Jim Meyering
8789f71bc6 maint: add parentheses to avoid new clang-10 warning
* src/dfasearch.c (regex_compile): Parenthesize arith-OR vs
ternary, to placate clang-10.
2020-12-30 08:23:08 -08:00
Paul Eggert
5398acf971 doc: clarify special chars and }
* doc/grep.texi (Fundamental Structure)
(Character Classes and Bracket Expressions)
(The Backslash Character and Special Expressions, Anchoring)
(Basic vs Extended): Clarify which characters are special,
and why \ is needed before } in grep even though } is not special.
Use Posix terminology for ordinary and special characters and for
interval expressions.
2020-12-29 23:11:19 -08:00
Marek Suppa
181f1647f7 doc: fix missing right curly brace
* doc/grep.texi (Basic vs Extended Regular Expressions): Mention that
the right curly brace (}) meta-character must be backslash-escaped.
It had been omitted from the list.
2020-12-29 19:41:06 -08:00
Jim Meyering
5326b08980 build: update gnulib to latest 2020-12-25 18:19:08 -08:00
Jim Meyering
62f8018550 grep: use of --unix-byte-offsets (-u) now elicits a warning
* NEWS (Change in behavior): Mention this.
* src/grep.c (main): Warn about each use of obsolete
--unix-byte-offsets (-u).
* doc/grep.in.1 (-u): Remove its documentation.
2020-12-25 08:44:07 -08:00
Helge Kreutzmann
91ce9cdad3 doc: adjust man page syntax
* doc/grep.in.1: Mark some manual names with B<...>.
Mark PATTERNS with I<...>.
Drop final period in SEE ALSO.
With suggestions from of several members of the manpage-l10n
translation community.  This resolves https://bugs.gnu.org/45353
2020-12-23 10:46:29 -08:00
Jim Meyering
192e59903c grep: avoid performance regression with many patterns
* src/grep.c (hash_pattern): Switch from PJW to DJB2, to avoid an
O(N) to O(N^2) performance regression due to hash collisions with
patterns from e.g., seq 500000|tr 0-9 A-J
Reported by Frank Heckenbach in https://bugs.gnu.org/44754
* NEWS (Bug fixes): Mention it.
* tests/hash-collision-perf: New file.
* tests/Makefile.am (TESTS): Add it.
2020-11-26 10:32:00 -08:00
Jim Meyering
9095818f26 build: update gnulib to latest for warning fixes
* gnulib: Update submodule to latest.
* src/grep.c (printf_errno): Reflect gnulib's renaming: change
_GL_ATTRIBUTE_FORMAT_PRINTF to
_GL_ATTRIBUTE_FORMAT_PRINTF_STANDARD
2020-11-26 09:26:33 -08:00
Jim Meyering
623008c296 tests: enable warnings for the gnulib-tests subdir
* gnulib-tests/Makefile.am (AM_CFLAGS): Enable gnulib
warning options for these tests.
* configure.ac (GNULIB_TEST_WARN_CFLAGS): Disable the same three
warning options that coreutils does, and a few more for GCC11.
2020-11-26 09:24:24 -08:00
Jim Meyering
fac92da8bc maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2020-11-08 20:38:28 -08:00
Jim Meyering
54aac07899 version 3.6
* NEWS: Record release date.
2020-11-08 20:36:16 -08:00
Jim Meyering
e34dae0539 build: update gnulib to latest for test improvements 2020-11-05 09:38:42 -08:00
Jim Meyering
358b1ae9ce build: update gnulib to latest for C++-ready dfa.h and test-verify.c fix 2020-11-03 12:37:05 -08:00
Paul Eggert
790be1fb2a grep: remove GREP_OPTIONS
* NEWS: Mention this.
* doc/grep.in.1:
Remove GREP_OPTIONS documentation.
* doc/grep.texi (Environment Variables):
Move GREP_OPTIONS stuff into a “no longer implemented” paragraph.
* src/grep.c (prepend_args, prepend_default_options): Remove.
(main): Do not look at GREP_OPTIONS.
* tests/Makefile.am (TESTS_ENVIRONMENTS):
* tests/init.cfg (vars_): Remove GREP_OPTIONS.
2020-11-03 11:09:47 -08:00
Norihiro Tanaka
ffc6e407e3 grep: use RE_NO_SUB when calling regex solely to check syntax
* src/dfasearch.c (regex_compile): New parameter. All callers changed.
(GEAcompile): Move setting syntax for regex into regex_compile()
function.  This addresses a performance problem exposed by extreme
regular expressions, as described in https://bugs.gnu.org/43862 .
2020-11-01 11:32:25 -08:00
Norihiro Tanaka
3bd06de4f7 tests: add the test for bugfix in gnulib's dfa
* tests/ere.tests: Add new test.
2020-11-01 10:21:45 -08:00
Jim Meyering
e8cf93650e grep: avoid erroneous matches for e.g., a+a+a+
* gnulib: Update to latest, for dfa's invalid-merge fix.
* NEWS (Bug fixes): Mention this.
2020-11-01 10:21:45 -08:00
Jim Meyering
df33ff807d grep: -P: report input filename upon PCRE execution failure
Without this, it could be tedious to determine which input
file evokes a PCRE-execution-time failure.
* src/pcresearch.c (Pexecute): When failing, include the
error-provoking file name in the diagnostic.
* src/grep.c (input_filename): Make extern, since used above.
* src/search.h (input_filename): Declare.
* tests/filename-lineno.pl: Test for this.
($no_pcre): Factor out.
* NEWS (Bug fixes): Mention this.
2020-10-11 20:30:30 -07:00
Paul Eggert
f31abf786f grep: minor kwset cleanups
* src/kwsearch.c (Fexecute):
Assume C99 to put declarations nearer uses.
* src/kwset.c (bmexec): Omit unnecessary test.
* src/kwset.h (struct kwsmatch): Make OFFSET and SIZE individual
elements, not arrays of size 1 (a revenant of an earlier API).
All uses changed.
2020-10-11 09:53:56 -07:00
Norihiro Tanaka
0c64ce796d grep: remove unused code
* src/kwsearch.c (Fcompile, Fexecute): Remove unused code.  No longer these
are used after commit 016e590a8198009bce0e1078f6d4c7e037e2df3c.
2020-10-11 09:53:56 -07:00
Paul Eggert
b3ebcf6fa9 build: update gnulib submodule to latest 2020-10-05 11:24:02 -07:00
Jim Meyering
e8421f8263 tests: correct filename-lineno.pl
* tests/filename-lineno.pl: Remove a stray envvar
that somehow slipped into expected output string.
2020-10-05 09:09:06 -07:00
Paul Eggert
a588951ee0 tests: fix tests when PCRE is not used
* tests/Makefile.am (TESTS_ENVIRONMENT):
Set PATH before setting PCRE_WORKS, so that the latter test
uses the just-built grep.
* tests/filename-lineno.pl (invalid-re-P-paren)
(invalid-re-P-star-paren): Adjust non-PCRE case to match
recently-changed behavior.
2020-10-05 00:49:29 -07:00
Paul Eggert
a0bc7ca891 build: update gnulib submodule to latest 2020-10-05 00:49:29 -07:00
Paul Eggert
3320f8db4c doc: document --include/--exclude better
Problem reported by John Ruckstuhl (Bug#43782).
* doc/grep.texi (File and Directory Selection):
Document what happens if contradictory options are given,
or if no option matches a file name.
* doc/grep.in.1:
2020-10-03 12:42:46 -07:00
Jim Meyering
c1bd3a955f maint: add technically-required quotes
* configure.ac: Quote args of AC_CONFIG_AUX_DIR, AC_CONFIG_SRCDIR
and AC_CHECK_FUNCS_ONCE.
2020-10-01 21:06:47 -07:00
Jim Meyering
2f3ec21c25 tests: restore deleted -P tests
v3.4-almost-45-g8577dda deleted these two -P-using tests because a
grep built without PCRE support would fail those tests. This sets
an envvar with the equivalent of the result from the require_pcre_
function and restores the now-guarded tests. Tested by running this:
  ./configure --disable-perl-regexp && make check
* tests/Makefile.am (PCRE_WORKS): Set this envvar.
* tests/filename-lineno.pl: Restore invalid-re-P-paren and
invalid-re-P-star-paren, now each with a guard.
2020-09-28 11:24:52 -07:00
Jim Meyering
111b8b5927 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2020-09-27 19:50:58 -07:00
Jim Meyering
ecb16a23a5 version 3.5
* NEWS: Record release date.
2020-09-27 19:49:14 -07:00
Jim Meyering
587f7ac605 maint: avoid autoconf warnings
* configure.ac (AC_HEADER_STDC): Remove.  It's been assumed for ages.
* m4/pcre.m4 (gl_FUNC_PCRE): Use AS_HELP_STRING, not AC_HELP_STRING.
2020-09-27 19:48:29 -07:00
Jim Meyering
bdbd94fd07 build: update gnulib to latest 2020-09-27 19:38:27 -07:00
Jim Meyering
2a4cee9d0b build: update gnulib to latest 2020-09-26 18:18:40 -07:00
Jim Meyering
17af3bd4db tests: skip stack-overflow test when built with ASAN
* tests/stack-overflow: Skip this test when the binary was built
with ASAN, to avoid spurious failures.
2020-09-26 17:46:34 -07:00
Paul Eggert
76e63d9f75 build: update gnulib submodule to latest 2020-09-25 19:06:40 -07:00
Paul Eggert
47b3a07988 build: update gnulib submodule to latest 2020-09-25 16:12:52 -07:00
Jim Meyering
b2228dbc8f tests: fix surrogate-pair test to work on 16-bit wchar_t systems
* tests/surrogate-pair: Avoid new failure on systems with
16-bit wchar_t.  Detect the condition and exit before the
otherwise-failing tests.  Remove the now-incorrect in-loop
test for that alternate failure mode.  This was exposed by
testing on gcc119.fsffrance.org, a power8 AIX 7.2 system.
2020-09-24 20:39:03 -07:00
Paul Eggert
8577dda638 grep: don't assume PCRE in tests
* tests/filename-lineno.pl: Remove invalid-re-P-paren and
invalid-re-P-star-paren as they assume PCRE support, which
causes a false alarm "grep: Perl matching not supported in a
--disable-perl-regexp build" on platforms without PCRE.
2020-09-23 19:56:24 -07:00
Paul Eggert
adc03b4563 grep: pacify Sun C 5.15
This suppresses a false alarm '"grep.c", line 720: warning:
initializer will be sign-extended: -1'.
* src/grep.c (uword_max): New static constant.
(initialize_unibyte_mask): Use it.
2020-09-23 19:56:24 -07:00
Paul Eggert
016e590a81 grep: fix more Turkish-eyes bugs
Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577).
* NEWS: Mention new bug report.
* src/grep.c (ok_fold): New static var.
(setup_ok_fold): New function.
(fgrep_icase_charlen): Reject single-byte characters
if they match some multibyte characters when ignoring case.
This part of the patch is partly derived from
<https://bugs.gnu.org/43577#14>, which means it is:
Co-authored-by: Norihiro Tanaka <noritnk@kcn.ne.jp>
(main): Call setup_ok_fold if ok_fold might be needed.
* src/searchutils.c (kwsinit): With the grep.c changes,
this code can now revert to classic 7th Edition Unix style;
aborting would be wrong.
* tests/turkish-eyes: Add tests for these bugs.
2020-09-23 19:56:24 -07:00
Paul Eggert
c6b0b7df3a build: update gnulib submodule to latest
* NEWS: Mention Bug#43577, which this fixes.
2020-09-23 19:56:24 -07:00
Paul Eggert
87c804fade grep: fix recently-introduced performance glitch
* src/grep.c (main): Do not double-increment update_patterns.
update_patterns increments n_patterns now; do not increment it
again, as the incorrect count would hurt performance heuristics later.
2020-09-23 19:56:24 -07:00
Paul Eggert
b0748fc4af doc: improve --line-buffer doc
* doc/grep.texi (Other Options): Document --line-buffered more
carefully, and say what happens when it is not used.  Problem
reported by Dan Jacobson (Bug#35339).
2020-09-22 10:58:55 -07:00
Paul Eggert
c4d7c4210b tests: port timeout test to Alpine
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/grep-devel/2020-09/msg00080.html
* tests/init.cfg (require_timeout_): Check that ‘timeout 0.01
sleep 0.02’ works as expected, to avoid spurious test failure
on Alpine.
2020-09-22 10:14:47 -07:00
Jim Meyering
4a9ad19352 tests: test for many-regexp N^2 RSS regression
* tests/many-regex-performance: New test for this performance
regression.
* tests/Makefile.am: Add it.
* NEWS (Bug fixes): Describe it.
2020-09-22 08:46:26 -07:00
Norihiro Tanaka
ae65513edc grep: avoid unnecessary regex compilation
Grep resorts to using the regex engine when the precision of either
-o or --color is required, or when the pattern is not supported by
our DFA engine (e.g., backref). Otherwise, grep would perform regex
compilation solely to check the syntax. This change makes grep skip
that compilation in the common case for which it is unnecessary.

The compilation we are avoiding is quite costly, consuming O(N^2)
RSS for N regular expressions.

* src/dfasearch.c (GEAcompile): Add new argument, and avoid unneeded
compilation of regex.
* src/grep.c (compile_fp_t): Update prototype.
(main): Update caller.
* src/kwsearch.c (Fcompile): Update caller and add new argument.
* src/pcresearch.c (Pcompile): Add new argument.
* src/search.h (GEAcompile, Fcompile, Pcompile): Update prototype.
2020-09-22 08:46:26 -07:00
Jim Meyering
34ada37baa build: update gnulib to latest 2020-09-22 08:46:26 -07:00
Jim Meyering
0bc552476f tests: skip stack-overflow test on midnightbsd*
* tests/stack-overflow: skip_ when run on this OS. See details
in https://lists.gnu.org/r/grep-devel/2020-09/msg00062.html
* tests/Makefile.am (host_triplet): Export.
2020-09-22 08:46:26 -07:00
Paul Eggert
1444b4979d doc: say how to match chars by code
From a suggestion in Bug#41004.
* doc/grep.texi (Character Encoding, Matching Non-ASCII):
New sections.  Move some material from Environment Variables
into these sections.
2020-09-21 20:22:30 -07:00
Paul Eggert
b3c01ff20d * src/dfasearch.c (struct dfa_comp): Fix out-of-date comment. 2020-09-18 14:51:53 -07:00
Paul Eggert
220bd3882c grep: "grep '\)'" reports an error again
* src/grep.c (try_fgrep_pattern): With -G, pass \) through to
GEAcompile so that it can complain.  This fixes an unexpected
change in behavior from grep 3.4 and earlier.
* tests/filename-lineno.pl: Add tests for this sort of thing.
2020-09-18 14:51:53 -07:00
Paul Eggert
b6d95a4e6b grep: tweak by using mempcpy
* src/grep.c (try_fgrep_pattern): Tweak previous change
by using mempcpy.
2020-09-18 13:41:27 -07:00
Jim Meyering
203ad5b718 grep: make echo .|grep '\.' match once again
The same applied for many other backslash-escaped bytes, not just
metacharacters.  The switch to rawmemchr in v3.4-almost-10-g9393b97
made some parts of the code require the usually-guaranteed newline
sentinel at the end of each pattern. Before, some consumers used a
(correct) pattern length and did not care that try_fgrep_pattern could
transform a pattern (with sentinel) like "\\.\n" to "..\n", thus
violating that assumption.
* src/grep.c (try_fgrep_pattern): Preserve the invariant
that each regexp is newline-terminated.
* tests/backslash-dot: New file. Test for this.
* tests/Makefile.am (TESTS): Add it.
2020-09-18 12:45:17 -07:00
Jim Meyering
97cc60acf8 tests: triple-backref: print a reference to glibc bug
* tests/triple-backref (MALLOC_CHECK_): And tell glibc not to
bother with a core dump.  Suggested by Pádraig Brady.
2020-09-18 07:31:45 -07:00
Paul Eggert
b9f3943910 grep: be more consistent about diagnostic format
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove 'quote'.
* src/grep.c: Do not include quote.h.
(grep, grepdirent, grepdesc): Put the three unusual diagnostics
into the same "grep: FOO: message" form that grep uses elsewhere.
* tests/binary-file-matches, tests/in-eq-out-infloop:
Adjust tests to match new diagnostic format.
2020-09-18 06:57:03 -07:00
Jim Meyering
0e862a91cc build: update gnulib to latest 2020-09-17 21:52:19 -07:00
Paul Eggert
e4e89bfc59 * tests/triple-backref: Add comment. 2020-09-17 20:26:41 -07:00
Jim Meyering
cf5ac78740 tests: make new test executable, to placate distcheck
* tests/binary-file-matches: Make this executable.
2020-09-17 16:11:58 -07:00
Jim Meyering
2c79623eb5 tests: add coverage for code that emits the new diagnostic
* tests/binary-file-matches: New file.
* tests/Makefile.am (TESTS): Add it.
2020-09-17 13:16:42 -07:00
Jim Meyering
4b3f435b62 maint: avoid syntax-check failure
* src/grep.c (grep): Lower-case the "B" in "Binary file... matches"
diagnostic that we now emit to stderr. This avoids the following
when running "make syntax-check":
  maint.mk: found capitalized error message
  make: *** [maint.mk:469: sc_error_message_uppercase] Error 1
2020-09-17 13:08:34 -07:00
Paul Eggert
271793f09c Send "Binary file FOO matches" to stderr
* NEWS, doc/grep.texi: Mention this change (Bug#29668).
* src/grep.c (grep): Send "Binary file FOO matches" to stderr
instead of stdout.
* tests/encoding-error, tests/invalid-multibyte-infloop:
* tests/null-byte, tests/pcre-count, tests/surrogate-pair:
* tests/symlink, tests/unibyte-binary:
Adjust tests to match new behavior.  In all cases this
simplifies the tests, which is a good sign.
2020-09-17 12:15:22 -07:00
Paul Eggert
c324508333 Suppress "Binary file FOO matches" if -I
Problem reported by Jason Franklin (Bug#33552).
* NEWS: Mention this.
* src/grep.c (grep): Do not output "Binary file FOO matches" if -I.
* tests/encoding-error: Add test for this bug.
2020-09-17 12:15:22 -07:00
Jim Meyering
ffb095b135 maint: keep two blank lines before each old Noteworthy line.
* NEWS: Insert a blank line.
2020-09-15 21:38:32 -07:00
Paul Eggert
ff30b75007 build: update gnulib submodule to latest 2020-09-15 16:52:32 -07:00
Paul Eggert
87b369a849 build: update gnulib submodule to latest 2020-09-13 18:50:22 -07:00
Paul Eggert
87b99ac431 build: update gnulib submodule to latest 2020-09-12 20:45:22 -07:00
Jim Meyering
2d8a39b90e build: update gnulib to latest 2020-09-11 01:08:01 -07:00
Paul Eggert
669a52ea60 grep: fix logic for growing PCRE JIT stack
* src/pcresearch.c (jit_exec) [PCRE_EXTRA_MATCH_LIMIT_RECURSION]:
When growing the match_limit_recursion limit, do not use the old
value if ! (flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION), as it is
uninitialized in that case.
2020-09-09 16:00:59 -07:00
Paul Eggert
68dddcfd25 grep: fix PCRE JIT test when JIT not available
Problem reported by Thomas Deutschmann (Bug#29446#23).
* src/pcresearch.c (Pexecute): Diagnose PCRE_ERROR_RECURSIONLIMIT.
* tests/pcre-jitstack: Treat recursion limit overflow like stack
overflow.
2020-09-09 16:00:59 -07:00
Paul Eggert
de6f36d9b6 grep: fix -w bug in UTF-8 locales
Problem reported by Mayo Fark (Bug#43225).
* src/searchutils.c (wordchar_prev): In a UTF-8 locale, do not
assume that an encoding-error byte cannot be part of a word
constituent, as this assumption is incorrect for the last byte
of a multibyte word constituent.
* tests/word-delim-multibyte: Add a test for the bug.
2020-09-09 12:44:37 -07:00
Paul Eggert
1021a92aa9 Distribute a gzip tarball again
Requested by Issam E. Maghni in:
https://lists.gnu.org/r/grep-devel/2020-09/msg00000.html
* configure.ac (AM_INIT_AUTOMAKE): Remove no-dist-gzip.
2020-09-09 11:30:22 -07:00
Paul Eggert
69be8bc553 * README-prereq: Also mention xz. 2020-09-09 11:30:22 -07:00
Paul Eggert
9393b97701 Prefer rawmemchr to memchr when it’s easy
* bootstrap.conf (gnulib_modules): Add rawmemchr.
* src/dfasearch.c (GEAcompile, EGexecute):
* src/grep.c (update_patterns, prpending, prtext):
* src/kwsearch.c (Fcompile, Fexecute):
* src/pcresearch.c (Pcompile, Pexecute):
Simplify (and presumably speed up a little) by using rawmemchr
with a sentinel, instead of using memchr.
2020-09-07 19:49:33 -07:00
Paul Eggert
0ede35a6cd Simplify pattern_file_name
* src/grep.c (pattern_file_name): Make first argument
origin-0, not origin-1, as this simplifies both caller and
callee.  All uses changed.
2020-09-07 19:49:33 -07:00
Paul Eggert
71b5c685d0 Simplify regex_compile
* src/dfasearch.c (regex_compile): "" suffices; we don’t need "\0".
No need to initialize pat_lineno.
2020-09-07 19:49:33 -07:00
Paul Eggert
33e4602c96 Omit duplicate regexps
Do not pass two copies of the same regexp to the
regular-expression engine.  Although the engines should
perform nearly as well even with the copies, in practice they do not.
Problem reported by Luca Borzacchiello (Bug#43040).
* bootstrap.conf (gnulib_modules): Add hash.
* src/grep.c: Include stdint.h, for SIZE_WIDTH.
Include hash.h.
(struct patloc, patloc, patlocs_allocated, patlocs_used):
Rename from struct FL_pair, fl_pair, n_fl_pair_slots, n_pattern_files,
respectively, since the data type is no longer a pair.
All uses changed.
(struct patloc): New member FILELINE.  The lineno member is now
ptrdiff_t since nowadays we prefer signed types.
(pattern_array, patterns_table): New static vars.
(count_nl_bytes, fl_add): Remove; no longer used.
(hash_pattern, compare_patterns, update_patterns): New functions.
update_patterns does what fl_add used to do, plus remove dups.
(pattern_file_name): Adjust to change from fl_pair to patloc.
(main): Move some variables to inner blocks for clarity.
Maintain the pattern_table hash of all patterns.
Update pattern_array to match keys, and use update_patterns
instead of fl_add to remove duplicate keys.
* tests/filename-lineno.pl (invalid-re-2-files)
(invalid-re-2-files2, invalid-re-2e): Ensure regexps are unique in
tests so that dups aren’t removed in diagnostics.
(invalid-re-line-numbers): New test.
2020-09-07 19:49:33 -07:00
Jim Meyering
7ded8efd72 build: update gnulib to latest
* gnulib: Update submodule to latest.
* bootstrap.conf (gnulib_modules): Add explicit dependency on dirname-lgpl.
Before, we pulled this in via a dependency.
* bootstrap: Update from gnulib.
2020-08-23 03:42:04 -07:00
Jim Meyering
7798d03901 build: require autoconf-2.64
* configure.ac: Require autoconf-2.64, up from 2.63, to align with gnulib.
2020-08-23 03:42:04 -07:00
Paul Eggert
0435ebca64 Revert -L exit status change introduced in grep 3.2
Problems reported by Antonio Diaz Diaz in:
https://bugs.gnu.org/28105#29
* NEWS, doc/grep.texi (Exit Status), src/grep.c (usage):
Adjust documentation accordingly.
* src/grep.c (grepdesc, main): Go back to old behavior.
* tests/skip-read: Adjust tests accordingly.
2020-08-22 14:07:48 -07:00
Paul Eggert
afc59ea200 tests: fix permission issue in previous change 2020-01-20 10:23:18 -08:00
Paul Eggert
759745166c tests: work around GCC -fprofile-generate bug
* tests/triple-backref: Add a 10 s timeout to work around
what appears to be a GCC bug with -fprofile-generate.
Problem reported by Martin Liška, with diagnosis by
Andreas Schwab (Bug#21513).
2020-01-20 09:33:17 -08:00
Jim Meyering
23bf27015e maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2020-01-02 14:12:56 -08:00
Jim Meyering
d1aaacee20 version 3.4
* NEWS: Record release date.
2020-01-02 14:12:56 -08:00
Jim Meyering
a9d19cf2fe build: update gnulib to latest, for mbrtowc-vs-Irix build fix 2020-01-02 14:12:56 -08:00
Paul Eggert
7bec6b13ce doc: mention glibc bug 24269
* doc/grep.texi (Known Bugs): Mention glibc bug 24269.
Merge formatting/URL changes from Gnulib regex.texi.
2020-01-02 14:11:32 -08:00
Paul Eggert
767c83fd30 doc: fix --exclude description in man page
Problem reported by Duncan Moore (Bug#37212).
* src/grep.c (usage): Fix incorrect statement about --exclude
and directories.  Standardize on “that match GLOB” instead
of “matching GLOB”.
2020-01-02 01:55:22 -08:00
Paul Eggert
83a50fa466 doc: fix missing “more” in man page
Problem reported by Philippe Schnoebelen (Bug#34078).
* doc/grep.in.1: Add missing “more”.
2020-01-02 01:10:42 -08:00
Paul Eggert
3ab01fbbb0 doc: add [:blank:] to man page
* doc/grep.in.1: Mention [:blank:] (Bug#33291).
2020-01-01 19:58:33 -08:00
Jim Meyering
5909bc45ad maint: update all copyright year number ranges
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* doc/grep.in.1: Use "-" in copyright year ranges, not \en.
2020-01-01 09:12:07 -08:00
Jim Meyering
11d7dd39c9 tests: avoid unwarranted failure in a netbsd 8.1 VM
* tests/mb-non-UTF8-perf-Fw: Run twice, to avoid first-read penalty.
Reported by Nelson H.F. Beebe.
2019-12-31 09:40:37 -08:00
Jim Meyering
8fdc90cc3f build: update gnulib to latest (for localeinfo perf fix) 2019-12-30 23:34:37 -08:00
Jim Meyering
f5bd9963a2 maint: add syntax-check rule to prohibit "backreference" spelling
* cfg.mk (sc_prohibit_backref): New rule.
2019-12-30 14:31:04 -08:00
Paul Eggert
e5a216be4e maint: remove too-long line from AUTHORS
* AUTHORS: Remove URL that’s too long.
2019-12-30 11:35:11 -08:00
Paul Eggert
a90a9983d6 maint: update AUTHORS
* AUTHORS: Update to better reflect current authorship.
2019-12-30 11:32:54 -08:00
Jim Meyering
bd81a1a2cb avoid new syntax-check failures
* cfg.mk (old_NEWS_hash): Updating old news, we must also udpate this.
2019-12-30 10:54:08 -08:00
Paul Eggert
ab73e1c642 doc: don’t encourage back-references
* doc/grep.texi (Usage): Remove palindrome question.  Bondioni’s
RE makes grep issue a ‘grep: stack overflow’ diagnostic, and we
shouldn’t be encouraging fancy back-references anyway, due to all
the bugs in this area (Bug#26864).  Plus, the allusion to
“GNU extensions” doesn't seem to be correct here.
2019-12-30 10:48:48 -08:00
Paul Eggert
fdee61d847 doc: robustify some examples
Prompted by suggestions by Stephane Chazelas (Bug#38792#20).
* doc/grep.texi (Usage): Make examples more robust.
2019-12-30 01:57:16 -08:00
Paul Eggert
71635837d1 doc: fix bug# typo 2019-12-30 00:58:20 -08:00
Paul Eggert
bc5ac38040 doc: spell "back-reference" more consistently 2019-12-30 00:52:10 -08:00
Paul Eggert
eac1e4d50a doc: mention back-reference bugs
Inspired by Bug#26864.
* doc/grep.texi (Known Bugs): New section.
Mention back-reference issues.
2019-12-30 00:48:50 -08:00
Paul Eggert
a517df5bd5 doc: Add -- to more-complex example
Suggested by Stephane Chazelas (Bug#38792).
* doc/grep.in.1, doc/grep.texi: Add ‘--’ to recently-added example.
2019-12-29 18:58:24 -08:00
Paul Eggert
ba4202cce9 doc: improve subsection title (Bug#26132)
* doc/grep.in.1: Rename "Matcher Selection" to "Pattern Syntax".
2019-12-29 10:54:47 -08:00
Paul Eggert
14b37769f2 doc: fix typo in previous patch 2019-12-29 10:45:06 -08:00
Paul Eggert
fe630c9fef doc: document quoting better
Problem reported by Martin Simons (Bug#38792).
* doc/grep.texi: Fix quoting used in examples.  Say that patterns
should be quoted, use quoting more consistently in examples, and
give an example illustrating the difference between patterns and
globbing.  Don’t assume zgrep expertise in example.
* doc/grep.in.1: Likewise.  Also, reorder sections
to match GNU/Linux man-pages style.
2019-12-29 10:36:56 -08:00
Jim Meyering
972fc260b9 maint: tweak NEWS wording
* NEWS: Minor wording change.
2019-12-26 19:34:07 -08:00
Jim Meyering
3f11cfb642 build: update gnulib to latest; and sync tests/init.sh
* gnulib: update
* tests/init.sh: Sync from gnulib (this removes the LC_ALL=C setting).
2019-12-26 19:09:32 -08:00
Jim Meyering
8b4600ae3b tests: avoid spurious failure due to 1-second timeout
* tests/grep-dev-null-out: Use a 10-second timeout, rather than
a 1-second one.  This avoids false failure on slow systems.
Reported by Assaf Gordon in
https://lists.gnu.org/r/grep-devel/2019-12/msg00018.html
2019-12-26 18:58:44 -08:00
Paul Eggert
d25ca0bed7 build: update gnulib submodule to latest 2019-12-26 01:07:07 -08:00
Paul Eggert
4985a28875 maint: adjust surrogate-pair for 16-bit wchar_t
* tests/surrogate-pair: Adjust to match fixed behavior
on AIX 7.2, where wchar_t is 16 bits and cannot represent
the test case data.
2019-12-26 01:06:33 -08:00
Jim Meyering
d9795ffa68 tests: fix typo in name of test file
* tests/backslash-s-vs-invalid-multitype: Rename to...
* tests/backslash-s-vs-invalid-multibyte: ...this.
* tests/Makefile.am (TESTS): Reflect renaming.
2019-12-25 15:21:16 -08:00
Jim Meyering
e5c59d08ad tests: ensure we use require_timeout_ when needed
* cfg.mk (sc_timeout_prereq): New syntax-check rule.
2019-12-25 15:21:16 -08:00
Jim Meyering
bba28f72f7 tests: require timeout
* tests/mb-non-UTF8-perf-Fw: This test uses "timeout",
so must first call require_timeout_.
This avoids test spurious failure when running with
no timeout program. Reported by Bruno Haible in
https://lists.gnu.org/r/grep-devel/2019-12/msg00008.html
2019-12-25 15:21:16 -08:00
Paul Eggert
bf5a308349 tests: work around AIX 7.2 sh printf bug
AIX 7.2 /bin/sh’s printf command mishandles octal escapes
in multibyte locales: it treats them as characters, not bytes.
* tests/backslash-s-vs-invalid-multitype, tests/encoding-error:
Use the C locale when employing the printf command with an octal
escape that AIX 7.2 sh might mishandle.
* tests/init.sh (setup_): Use the C locale for tests.
This has the side benefit of making them more reproducible.
2019-12-25 14:40:55 -08:00
Jim Meyering
6b4ee2c0b1 maint: adjust new comments
* src/dfasearch.c (possible_backrefs_in_pattern): Remove a
duplicate "a", insert a "be" and a comma, and reformat.
2019-12-22 17:53:02 -08:00
Jim Meyering
5b1d1eac2c build: update gnulib to latest
* gnulib: Update submodule to latest.
* bootstrap: Copy from gnulib.
* tests/init.sh: Likewise.
2019-12-22 17:39:42 -08:00
Paul Eggert
c2ec762dbc grep: fix some bugs in pattern-grouping speedup
This fixes some bugs in the previous commit,
and should finish the fix for Bug#33249.
* NEWS: Mention fix for Bug#33249.
* src/dfasearch.c (possible_backrefs_in_pattern, regex_compile)
(GEAcompile): In new code, prefer ptrdiff_t to size_t when either
will do, since ptrdiff_t has better error checking.  At some point
we should adjust the old code too.
(possible_backrefs_in_pattern): Rename from
find_backref_in_pattern.  New arg BS_SAFE.  All uses changed.
Fix false negative if a multibyte character ends in a single
'\\' byte, followed by the two bytes '\\', '1'.
(regex_compile): Simplify.
(GEAcompile): Avoid quadratic behavior when reallocating growing
buffers.  Fix a couple of bugs in copying pattern data involving
backreferences.  Fix another bug in copying pattern metadata
involving backreferences, by removing the need to copy it.
2019-12-22 16:40:08 -08:00
Norihiro Tanaka
abb7f4f232 grep: grouping of a pattern with multiple lines
When grep uses regex, it splits a pattern with multiple lines by
newline character into fragments.  Compilation and execution run for
each fragment.  That causes slowdown.  By this change, each fragment is
divided into groups by whether the fragment includes back references.
A fragment with back references constitutes group, and all fragments
that lack back references also constitute a group.

This change extremely speeds-up following case.

  $ seq -f '%040g' 0 9999 | sed '1s/$/\\(0\\)\\1/' >pat
  $ yes 00000000000000000000000000000000000000000x | head -10000 >in
  $ time -p env LC_ALL=C src/grep -f pat in

* src/dfasearch.c (find_backref_in_pattern, regex_compile):
New functions.
(GEAcompile): Use the new functions to group fragments
as mentioned above.
2019-12-22 16:40:08 -08:00
Paul Eggert
cf09252295 maint: add NEWS for Bug#34951 fix
* NEWS: Mention Bug#34951.
2019-12-19 19:37:16 -08:00
Norihiro Tanaka
c1c1774fa4 dfa: separate parse and compile phase
DFAMUST() must be called after parse and before tokens re-order which is
introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98, but both are
executed in compilation phase.

* lib/dfa.c (dfaparse): Change it to global function.
(dfacomp): If first argument is NULL, skip parse.
* lib/dfa.h: (dfaparse): Add a prototype.
2019-12-19 19:37:16 -08:00
Paul Eggert
2adf15c362 build: update gnulib submodule to latest 2019-12-19 19:37:16 -08:00
Norihiro Tanaka
be9224d971 grep: speed up multiple word matching
grep uses its KWset matcher for multiple word matching, but that is
very slow when most of the parts matched to a pattern are not words.
So, if the first match to a pattern is not a word, use the grep matcher
to match for its line.

Note that when START_PTR is set, the grep matcher uses the regex matcher
which is very slow to match words.  Therefore, we use the grep matcher
when only START_PTR is NULL.

* src/kwsearch.c (Fexecute): If an initial match is incomplete because
not on a word boundary, use the grep matcher to find a matching line.
2019-12-19 07:20:02 -08:00
Jim Meyering
0cd76e7aee maint: sort test names
* tests/Makefile.am (TESTS): Alphabetize the new addition,
mb-non-UTF8-perf-Fw to placate syntax-check's sc_sorted_tests.
2019-12-18 18:46:05 -08:00
Paul Eggert
8d8dc4c62e maint: adjust to recent Gnulib change
* po/POTFILES.in: Remove lib/xstrtol-error.c.
2019-12-18 00:08:21 -08:00
Paul Eggert
c9a6e4bf91 grep: do not match invalid UTF-8
Update Gnulib to latest.  Also:
* src/dfasearch.c (EGexecute): Use ptrdiff_t, not size_t,
to match new Gnulib API.
* tests/Makefile.am (TESTS): Add dfa-invalid-utf8.
* tests/dfa-invalid-utf8: New file.
2019-12-17 22:03:37 -08:00
Jim Meyering
fdd45db167 tests: add test that would have detected -Fw perf regression
* tests/mb-non-UTF8-perf-Fw: New file. Detect v3.3-22-g090a4db's
performance regression.
* tests/Makefile.am (TESTS): Add it.
2019-12-01 06:14:42 +08:00
Jim Meyering
8b994007af maint: fix test comment
* tests/mb-non-UTF8-word-boundary: Also correct "introduced-in"
version number in a comment here.
2019-11-30 13:58:09 +08:00
Jim Meyering
dccdd4c48e maint: correct NEWS blurb
* NEWS (Bug fixes): Correction: the -Fw bug was introduced
in 2.28, not in 3.0. Reported by Paul Eggert.
2019-11-26 05:40:34 +08:00
Norihiro Tanaka
449f1c5805 grep: improve grep -Fw performance in non-UTF8 multibyte locales
* src/searchutils.c (mb_goback): New parameter.  All callers changed.
* src/search.h (mb_goback): Update prototype.
* src/kwsearch.c (Fexecute): Use mb_goback's MBCLEN to detect a
word-boundary even more efficiently.
2019-11-17 07:15:35 -08:00
Norihiro Tanaka
cea97a8490 grep: fix performance regression with previous patch
* src/kwsearch.c (Fexecute): Avoid unnecessary back-up in non-UTF8
multibyte locales.
2019-11-17 07:15:21 -08:00
Jim Meyering
0172bf6825 maint: rename a variable: bol -> nl
* src/kwsearch.c (Fexecute): Change misleading name: s/bol/nl/
2019-11-16 10:52:33 -08:00
Jim Meyering
4cae7410f4 build: update gnulib to latest 2019-11-16 10:37:41 -08:00
Jim Meyering
0effe42bf4 maint: correct and clarify a comment
* src/kwsearch.c (Fexecute): Logic was reversed.
2019-11-16 10:37:25 -08:00
Jim Meyering
090a4dbe03 grep: avoid false -Fw match in non-UTF8 multibyte locales
For example, this command would erroneously print its input line:
  echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b
This arose when the "memrchr" search for a preceding newline failed:
in that case, MB_START was not adjusted and was initially the same
as BEG, so wordchar_prev mistakenly returned 0.
* src/kwsearch.c (Fexecute): Set MB_START also when there is no
preceding newline.
* NEWS (Bug fixes): Mention it.
* tests/mb-non-UTF8-word-boundary: New file. Test for the bug.
* tests/Makefile.am (TESTS): Add it.
Reported by NIDE, Naoyuki in https://bugs.gnu.org/38223.
2019-11-16 10:37:25 -08:00
Jim Meyering
a37a439a7f build: update gnulib to latest
* po/POTFILES.in: Add lib/argmatch.h.
2019-11-08 17:54:24 -08:00
Paul Eggert
6193ba1cb5 grep: new --no-ignore-case option
Suggested by Karl Berry and mostly implemented by Arnold Robbins
(Bug#37907).
* NEWS:
* doc/grep.in.1:
* doc/grep.texi (Matching Control):
* src/grep.c (usage):
Document the new option.
* src/grep.c (NO_IGNORE_CASE_OPTION): New constant.
(long_options, main): Support new option.
2019-11-05 15:33:21 -08:00
Paul Eggert
50312b72f5 grep: simplify previous patch
* src/grep.c (main): Use an int rather than an enum for a local
var, which is overkill here.
2019-11-05 15:02:23 -08:00
Paul Eggert
bacb70670e grep: further simplify out_file handling
* src/grep.c (print_filenames): Make this a local variable instead
of static.  Rename it to filename_option, to avoid confusion with
the print_filename function, and rename the enum values for the
same reason.  All uses changed.
(out_file): Now -1, 0, 1 to represent unknown, false, true.
All uses changed.
(single_command_line_arg): Remove.  This static variable’s
function is now accomplished by a local variable ‘num_operands’.
(grepdesc): Simplify adjustment of out_file accordingly.
(main): Initialize out_file to -1 if not known yet.
2019-11-05 12:43:26 -08:00
Zev Weiss
caf60ca732 grep: simplify out_file handling
* src/grep.c (print_filenames): New tristate enum (-H, -h, or
neither); supplants with_filenames and no_filenames.
(single_command_line_arg): New variable indicating if grep was run
with a single command-line argument.
(no_filenames): Remove variable.
(grepdirent): Don't twiddle out_file back and forth during recursion.
(grepdesc): Turn off out_file on 'grep -r foo nondirectory'.
(main): Replace with_filenames and no_filenames with print_filenames.
Enable out_file when both -r/-R and multiple arguments are given.
2019-11-05 12:43:26 -08:00
Paul Eggert
268cae478d grep: fix ‘grep -L ... >/dev/null’ bug
Problem reported by Adam Sampson (Bug#37716).
* NEWS: Mention this.
* src/grep.c (grepdesc): Don’t assume that stdout being /dev/null
means list_files == LISTFILES_NONE.
(main): Do not change list_files merely because stdout is /dev/null.
* tests/skip-read: Test for this bug.
2019-10-12 18:31:56 -07:00
Paul Eggert
cbf55dfc4c grep: tighten -i doc
* doc/grep.in.1:
* doc/grep.texi (Matching Control):
* src/grep.c (usage):
Make it clearer that -i affects patterns and data, but not
file names (Bug#37604).
2019-10-03 15:23:54 -07:00
Paul Eggert
7068550717 maint: fix “/src/grep: No such file or directory”
Problem reported by Jim Meyering in:
https://lists.gnu.org/r/grep-devel/2019-02/msg00000.html
* NEWS: Mention the change.
* configure.ac (fn_grep): Remove.  This old attempt to fix
<https://savannah.gnu.org/bugs/?31646> wasn’t working anyway,
since subprograms didn’t grok fn_grep.  People building on Solaris
will need a working grep, which is reasonably standard nowadays.
(GREP, EGREP): Do not override.  This way, we test the
newly-built grep only when running ‘make test’ and suchlike.
Instead, output a hopefully-helpful diagnostic if the
system 'grep' does not work.
2019-03-10 11:51:19 -07:00
Jim Meyering
3b232c7e2c tests: avoid false positive upon stack overflow
* tests/pcre-jitstack: Don't let a stack overflow evoke a false
failure.  This test is to ensure there is no internal PCRE error.
Reported by Andreas Schwab in http://bugs.gnu.org/34370
2019-02-18 10:51:52 -08:00
Jim Meyering
6861bd8698 build: avoid build failure with --enable-gcc-warnings
* src/kwset.c (bmexec_trans): Define with _GL_ATTRIBUTE_PURE,
per suggestion from recent gcc snapshot.
2019-02-16 12:51:41 -08:00
Paul Eggert
ab8c4b4c39 doc: clarify --exclude globbing
Problem reported by Paul Jackson.
* doc/grep.in.1:
* doc/grep.texi (File and Directory Selection):
Clarify how --exclude globbing works.
2019-02-03 22:42:58 -08:00
Paul Eggert
65779292d2 grep: parse --color arg independent of locale
This is a better fix for Bug#34285.
* bootstrap.conf (gnulib_modules): Add c-strcase.
* src/grep.c: Include c-strcase.h, not strings.h.
(main): Use c_strcasecmp, not strcasecmp.
2019-02-03 11:47:07 -08:00
Paul Eggert
1e877596ad grep: fix grep.c includes
* src/grep.c: Include strings.h; problem reported by David
Monniaux (Bug#34285).  Do not include fcntl.h, as system.h does
that for us.h
2019-02-02 22:47:23 -08:00
Paul Eggert
0febe138d9 build: update gnulib submodule to latest 2019-02-02 22:47:23 -08:00
Jim Meyering
8df79c9fa7 build: ensure no VLA is used
Cause developer builds to fail for any use of a VLA.
VLAs (variable length arrays) limit portability.
* configure.ac (nw): Remove -Wvla from the list of disabled warnings,
thus enabling the warning when configured with --enable-gcc-warnings.
(GNULIB_NO_VLA) Define, disabling use of VLAs in gnulib.  This commit
is functionally equivalent to coreutils' v8.30-44-gd26dece5d.
2019-01-20 22:19:54 -08:00
Jim Meyering
06cec163de build: update gnulib to latest 2019-01-20 21:08:24 -08:00
Paul Eggert
479c498daf doc: --binary-files update in man page
* doc/grep.in.1: Adjust --binary-files description to match that
in doc/grep.texi.  When I updated the documentation in
2016-09-09T01:33:14!eggert@cs.ucla.edu I forgot to update the man
page accordingly (Bug#33898).
2019-01-20 19:40:46 -08:00
Paul Eggert
f777b3e613 grep: simplify pcresearch.c ifdefs
This fixes a warning if PCRE is not used (Bug#34054).
* configure.ac (USE_PCRE): New conditional.
* src/Makefile.am (grep_SOURCES) [!USE_PCRE]: Omit pcresearch.c.
* src/grep.c (matchers) [!HAVE_LIBPCRE]: Omit perl matcher.
(setmatcher) [!HAVE_LIBPCRE]: If helpful, mention
--disable-perl-regexp in diagnostic.
* src/pcresearch.c: Simplify by assuming HAVE_LIBPCRE.
2019-01-20 10:01:37 -08:00
Jim Meyering
1019e6e35a maint: update all copyright dates via "make update-copyright"
* gnulib: Also update submodule for its copyright updates.
2019-01-01 19:00:38 -08:00
Jim Meyering
c01ea2ad32 doc: fix the bug-introduced version in 3.3's announcement
* NEWS: Correct bug-introduced version (s/2.3/3.2/).
* cfg.mk (old_NEWS_hash): Updating old news, we must also udpate this.
2018-12-20 20:48:14 -08:00
Jim Meyering
1432836963 maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2018-12-20 20:39:59 -08:00
154 changed files with 5766 additions and 4206 deletions

1
.gitignore vendored
View File

@ -52,6 +52,7 @@
/tests/cspatfile
/tests/ere.script
/tests/get-mb-cur-max
/tests/init.sh
/tests/khadafy.out
/tests/patfile
/tests/spencer1.script

2
.gitmodules vendored
View File

@ -1,3 +1,3 @@
[submodule "gnulib"]
path = gnulib
url = git://git.sv.gnu.org/gnulib.git
url = https://git.savannah.gnu.org/git/gnulib

View File

@ -1 +1 @@
3.2
3.12

41
AUTHORS
View File

@ -1,4 +1,4 @@
Copyright (C) 1992, 1997-2002, 2004-2018 Free Software Foundation, Inc.
Copyright (C) 1992, 1997-2002, 2004-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
@ -6,16 +6,20 @@
Mike Haertel wrote the main program and the dfa and kwset matchers.
Isamu Hasegawa wrote the POSIX regular expression matcher, which is
part of the GNU C Library and is distributed as part of GNU grep for
use on non-GNU systems. Ulrich Drepper, Paul Eggert, Paolo Bonzini,
Stanislav Brabec, Assaf Gordon, Jakub Jelinek, Jim Meyering, Arnold
Robbins, Andreas Schwab and Florian Weimer also contributed to this
matcher.
Arthur David Olson contributed the heuristics for finding fixed substrings
at the end of dfa.c.
Richard Stallman and Karl Berry wrote the regex backtracking matcher.
Henry Spencer wrote the original test suite from which grep's was derived.
Scott Anderson invented the Khadafy test.
David MacKenzie wrote the automatic configuration software use to
David MacKenzie wrote the automatic configuration software used to
produce the configure script.
Authors of the replacements for standard library routines are identified
@ -26,23 +30,26 @@ non-matching text before calling the regexp matcher was originally due
to James Woods. He also contributed some code to early versions of
GNU grep.
Mike Haertel would like to thank Andrew Hume for many fascinating discussions
of string searching issues over the years. Hume & Sunday's excellent
paper on fast string searching (AT&T Bell Laboratories CSTR #156)
describes some of the history of the subject, as well as providing
exhaustive performance analysis of various implementation alternatives.
Mike Haertel would like to thank Andrew Hume for many fascinating
discussions of string searching issues over the years. Hume and
Sunday's excellent paper on fast string searching describes some of
the history of the subject, as well as providing exhaustive
performance analysis of various implementation alternatives.
The inner loop of GNU grep is similar to Hume & Sunday's recommended
"Tuned Boyer Moore" inner loop.
More work was done on regex.[ch] by Ulrich Drepper and Arnold
Robbins. Regex is now part of GNU C library, see this package
for complete details and credits.
"Tuned Boyer Moore" inner loop (see the Hume & Sunday citation in
the grep manual's "Performance" chapter).
Arnold Robbins contributed to improve dfa.[ch]. In fact
it came straight from gawk-3.0.3 with small editing and fixes.
Many folks contributed. See THANKS; if I omitted someone please
send me email.
Norihiro Tanaka contributed many performance improvements and other
fixes, particularly to multi-byte matchers.
Paul Eggert contributed support for recursive grep, as well as several
performance improvements such as searching file holes efficiently.
Many other folks contributed. See THANKS; if someone is omitted
please file a bug report.
Alain Magloire maintained GNU grep until version 2.5e.

View File

@ -1407,7 +1407,7 @@
is put in different compiled structure patterns[]. The patterns
are given to dfacomp() and kwsmusts() as is.
(Ecompile): Likewised.
(Fcompile): Reverse to the old behaviour of compiling the enire
(Fcompile): Reverse to the old behaviour of compiling the entire
patterns in one shot.
(EGexecute): If falling to GNU regex for the matching, loop in the
array of compile patterns[] to find a match.
@ -1457,7 +1457,7 @@
(xrealloc): Removed using lib/xmalloc.c.
(xmalloc): Removed using lib/xmalloc.c
(main): Register with atexit() to check for error on stdout.
* configure.in: Check for atexit(), call jm_MALLOC, jm_RELLOC and
* configure.in: Check for atexit(), call jm_MALLOC, jm_REALLOC and
jm_PREREQ_ERROR.
* tests/bre.awk: Removed the hack to drain the buffer since we
always fclose(stdout) atexit.
@ -1541,7 +1541,7 @@
* src/exclude.h: New file.
* src/grep.c (main): Took the GNU tar code to handle
the option --include, --exclude, --exclude-from.
Files are check for a match, with exlude_filename ().
Files are check for a match, with exclude_filename ().
New option --exclude-from.
* src/savedir.c: Call exclude_filename() to check for
file pattern exclusion or inclusion.
@ -1592,7 +1592,7 @@
* m4/dosfile.m4 (AC_DOSFILE): Move AC_DEFINEs out of AC_CACHE_CHECK.
2001-02-17 Alain Malgoire
2001-02-17 Alain Magloire
* doc/grep.texi: Document the new options and the new behaviour
back-references are local. Use excerpt from Karl Berry regex
@ -1699,8 +1699,8 @@
(color): Rename color variable to color_option.
Removed 'always|never|auto' arguments, not necessary for grep.
(exclude_pattern): new variable, holder for the file pattern.
(include_pattern): new variable, hoder for the file pattern.
* src/savedir.c: Signature change, take two new argmuments.
(include_pattern): new variable, holder for the file pattern.
* src/savedir.c: Signature change, take two new arguments.
* doc/grep.texi: Document, new options.
* doc/grep.man: Document, new options.
@ -1712,7 +1712,7 @@
2001-02-09 Alain Magloire
Patch from Ulrich Drepper to provide hilighting.
Patch from Ulrich Drepper to provide highlighting.
* src/grep.c: New option --color.
(color): New static var.
@ -1722,7 +1722,7 @@
to find the offset of the matching string.
* src/savedir.c: Take advantage of _DIRENT_HAVE_TYPE if supported.
* src/search.c (EGexecute, Fexecute, Pexecute): Take a new argument
when doing exact match for the color hiligting.
when doing exact match for the color highlighting.
2000-09-01 Brian Youmans
@ -1792,7 +1792,7 @@
2000-06-02 Paul Eggert
Problen noted by Gerald Stoller <gerald_stoller@hotmail.com>
Problem noted by Gerald Stoller <gerald_stoller@hotmail.com>
* src/grep.c (main): POSIX says that -q overrides -l, which
in turn overrides the other output options. Fix grep to
@ -2208,7 +2208,7 @@
on pre-OpenVMS 7.x systems; general overhaul.
* src/getpagesize.h: Reinstate support for different pagesizes on
VAX and Alpha. Work around problem with DEC C compiler.
* src/vms_fab.c: Cast to some assigments; fixed typo argcp vs. argp.
* src/vms_fab.c: Cast to some assignments; fixed typo argcp vs. argp.
* src/vms_fab.h: Added new include files to avoid warnings about
undefined function prototypes.
Those patches were provided by Martin P.J. Zinser (zinser@decus.de).
@ -2670,7 +2670,7 @@
1999-03-16 Volker Borchert
* configure.in: Use case case ... esac for checking Visual C++.
* configure.in: Use case ... esac for checking Visual C++.
When ${CC} contains options it was not recognize.
1999-03-07 Paul Eggert
@ -2764,7 +2764,7 @@
1999-02-10 Alain Magloire
* bootstrap/{Makefile{try,am},REAMDE} : skeleton
* bootstrap/{Makefile{try,am},README} : skeleton
provided for system lacking the tools to autoconfigure.
* src/{e,f,}grepmat.c: added guard [HAVE_CONFIG_H]
@ -2858,7 +2858,7 @@
* doc/Makefile.am djgpp/Makefile.am m4/Makefile.am vms/Makefile.am:
New files.
* m4/progtest.m4: proctect '[]' from m4.
* m4/progtest.m4: protect '[]' from m4.
Noted by Eli Z.
* PATCHES-AC: New file, add the patch for autoconf in the dist.
@ -3333,7 +3333,7 @@
Suggested by Harald Hanche-Olsen.
* src/grep.c (main): '-f /dev/null' now specifies no patterns
and therfore matches nothing.
and therefore matches nothing.
Reported by Jorge Stolfi.
Patched by Paul Eggert.
@ -3368,7 +3368,7 @@
* src/grep.c: reverse back to greping directories,
One could skip the error message by defining
SKIP_DIR_ERROR. There is no clear way of doing
things, I hope to setle this on the next majore release
things, I hope to settle this on the next major release
Thanks Paul Eggert, Eli Zaretskii and gnits for the
exchange.
@ -3427,7 +3427,7 @@
(setmatcher) [HAVE_SETRLIMIT]: Set re_max_failures so that the
matcher won't ever overflow the stack.
(main) [__MSDOS__, _WIN32]: Handle backslashes and drive letters
in argv[0], remove the .exe suffix, and downcase the prgram name.
in argv[0], remove the .exe suffix, and downcase the program name.
[O_BINARY]: Pass additional DOS-specific options to getopt_long
and handle them. Call stat before attempting to open the file, in
case it is a directory (DOS will fail the open call for
@ -3497,7 +3497,7 @@
regex package. Change the way the tests were done to be more
conformant to automake.
* configure.in: added --disable-regex for folks with their own fuctions.
* configure.in: added --disable-regex for folks with their own functions.
* grep-20d : available for testing
@ -3551,7 +3551,7 @@
* check.sh, scriptgen.awk: fix grep paths.
* change the directory strucure: grep is now in src to comply with
* change the directory structure: grep is now in src to comply with
gettext.m4.
* grep.c version.c [VERSION]: got rid of version.c,
@ -3648,6 +3648,6 @@
* Version 2.0 released.
Copyright (C) 1998-2018 Free Software Foundation, Inc.
Copyright (C) 1998-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted provided the copyright notice and this notice are preserved.

View File

@ -12,7 +12,7 @@ Use the latest upstream sources
Base any changes you make on the latest upstream sources.
You can get a copy of the latest with this command:
git clone git://git.sv.gnu.org/grep
git clone https://git.savannah.gnu.org/git/grep
That downloads the entire repository, including revision control history.
Once downloaded, you can get incremental updates by running one of
@ -83,7 +83,7 @@ Make your changes on a private "topic" branch
=============================================
So you checked out grep like this:
git clone git://git.sv.gnu.org/grep
git clone https://git.savannah.gnu.org/git/grep
Now, cd into the grep/ directory and run:
@ -468,7 +468,7 @@ you'd use doc/Copyright/request-assign.future:
https://www.gnu.org/software/gnulib/Copyright/request-assign.future
You may make assignments for up to four projects at a time.
[
In case you're wondering why we bother with all of this, read this:
https://www.gnu.org/licenses/why-assign.html
@ -597,7 +597,7 @@ Then just open the index.html file (in the generated lcov-html directory)
in your favorite web browser.
========================================================================
Copyright (C) 2009-2018 Free Software Foundation, Inc.
Copyright (C) 2009-2026 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or

View File

@ -1,6 +1,6 @@
# Process this file with automake to create Makefile.in
#
# Copyright 1997-1998, 2005-2018 Free Software Foundation, Inc.
# Copyright 1997-1998, 2005-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -66,13 +66,10 @@ gen-ChangeLog:
# current locale considers to be equal.
ASSORT = LC_ALL=C sort
# Extract all lines up to the first one starting with "##".
prologue = perl -ne '/^\#\#/ and exit; print' $(srcdir)/THANKS.in
THANKS: THANKS.in Makefile.am .mailmap thanks-gen
$(AM_V_GEN)rm -f $@-t $@; \
{ \
$(prologue); echo; \
perl -ne '/^\#\#/ and exit; print' $(srcdir)/THANKS.in; echo; \
{ perl -ne '/^$$/.../^$$/ and !/^$$/ and s/ +/\0/ and print' \
$(srcdir)/THANKS.in; \
git log --pretty=format:'%aN%x00%aE' \

283
NEWS
View File

@ -1,5 +1,274 @@
GNU grep NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
* Noteworthy changes in release 3.12 (2025-04-10) [stable]
** Bug fixes
Searching a directory with at least 100,000 entries no longer fails
with "Operation not supported" and exit status 2. Now, this prints 1
and no diagnostic, as expected:
$ mkdir t && cd t && seq 100000|xargs touch && grep -r x .; echo $?
1
[bug introduced in grep 3.11]
-mN where 1 < N no longer mistakenly lseeks to end of input merely
because standard output is /dev/null.
** Changes in behavior
The --unix-byte-offsets (-u) option is gone. In grep-3.7 (2021-08-14)
it became a warning-only no-op. Before then, it was a Windows-only no-op.
On Windows platforms and on AIX in 32-bit mode, grep in some cases
now supports Unicode characters outside the Basic Multilingual Plane.
* Noteworthy changes in release 3.11 (2023-05-13) [stable]
** Bug fixes
With -P, patterns like [\d] now work again. Fixing this has caused
grep to revert to the behavior of grep 3.8, in that patterns like \w
and \b go back to using ASCII rather than Unicode interpretations.
However, future versions of GNU grep and/or PCRE2 are likely to fix
this and change the behavior of \w and \b back to Unicode again,
without breaking [\d] as 3.10 did.
[bug introduced in grep 3.10]
grep no longer fails on files dated after the year 2038,
when running on 32-bit x86 and ARM hosts using glibc 2.34+.
[bug introduced in grep 3.9]
grep -P no longer fails to match patterns using negated classes
like \D or \W when linked with PCRE2 10.34 or newer.
[bug introduced in grep 3.8]
** Changes in behavior
grep --version now prints a line describing the version of PCRE2 it uses.
For example, it prints this when built with the very latest from git:
grep -P uses PCRE2 10.43-DEV 2023-04-14
or this with what's currently available in Fedora 37:
grep -P uses PCRE2 10.40 2022-04-14
previous versions of grep wouldn't respect the user provided settings for
PCRE_CFLAGS and PCRE_LIBS when building if a libpcre2-8 pkg-config module
was found.
* Noteworthy changes in release 3.10 (2023-03-22) [stable]
** Bug fixes
With -P, \d now matches only ASCII digits, regardless of PCRE
options/modes. The changes in grep-3.9 to make \b and \w work
properly had the undesirable side effect of making \d also match
e.g., the Arabic digits: ٠١٢٣٤٥٦٧٨٩. With grep-3.9, -P '\d+'
would match that ten-digit (20-byte) string. Now, to match such
a digit, you would use \p{Nd}. Similarly, \D is now mapped to [^0-9].
[bug introduced in grep 3.9]
* Noteworthy changes in release 3.9 (2023-03-05) [stable]
** Bug fixes
With -P, some non-ASCII UTF8 characters were not recognized as
word-constituent due to our omission of the PCRE2_UCP flag. E.g.,
given f(){ echo Perú|LC_ALL=en_US.UTF-8 grep -Po "$1"; } and
this command, echo $(f 'r\w'):$(f '.\b'), before it would print ":r".
After the fix, it prints the correct results: "rú:ú".
When given multiple patterns the last of which has a back-reference,
grep no longer sometimes mistakenly matches lines in some cases.
[Bug#36148#13 introduced in grep 3.4]
* Noteworthy changes in release 3.8 (2022-09-02) [stable]
** Changes in behavior
The -P option is now based on PCRE2 instead of the older PCRE,
thanks to code contributed by Carlo Arenas.
The egrep and fgrep commands, which have been deprecated since
release 2.5.3 (2007), now warn that they are obsolescent and should
be replaced by grep -E and grep -F.
The confusing GREP_COLOR environment variable is now obsolescent.
Instead of GREP_COLOR='xxx', use GREP_COLORS='mt=xxx'. grep now
warns if GREP_COLOR is used and is not overridden by GREP_COLORS.
Also, grep now treats GREP_COLOR like GREP_COLORS by silently
ignoring it if it attempts to inject ANSI terminal escapes.
Regular expressions with stray backslashes now cause warnings, as
their unspecified behavior can lead to unexpected results.
For example, '\a' and 'a' are not always equivalent
<https://bugs.gnu.org/39678>. Similarly, regular expressions or
subexpressions that start with a repetition operator now also cause
warnings due to their unspecified behavior; for example, *a(+b|{1}c)
now has three reasons to warn. The warnings are intended as a
transition aid; they are likely to be errors in future releases.
Regular expressions like [:space:] are now errors even if
POSIXLY_CORRECT is set, since POSIX now allows the GNU behavior.
** Bug fixes
In locales using UTF-8 encoding, the regular expression '.' no
longer sometimes fails to match Unicode characters U+D400 through
U+D7FF (some Hangul Syllables, and Hangul Jamo Extended-B) and
Unicode characters U+108000 through U+10FFFF (half of Supplemental
Private Use Area plane B).
[bug introduced in grep 3.4]
The -s option no longer suppresses "binary file matches" messages.
[Bug#51860 introduced in grep 3.5]
** Documentation improvements
The manual now covers unspecified behavior in patterns like \x, (+),
and range expressions outside the POSIX locale.
* Noteworthy changes in release 3.7 (2021-08-14) [stable]
** Changes in behavior
Use of the --unix-byte-offsets (-u) option now evokes a warning.
Since 3.1, this Windows-only option has had no effect.
** Bug fixes
Preprocessing N patterns would take at least O(N^2) time when too many
patterns hashed to too few buckets. This now takes seconds, not days:
: | grep -Ff <(seq 6400000 | tr 0-9 A-J)
[Bug#44754 introduced in grep 3.5]
* Noteworthy changes in release 3.6 (2020-11-08) [stable]
** Changes in behavior
The GREP_OPTIONS environment variable no longer affects grep's behavior.
The variable was declared obsolescent in grep 2.21 (2014), and since
then any use had caused grep to issue a diagnostic.
** Bug fixes
grep's DFA matcher performed an invalid regex transformation
that would convert an ERE like a+a+a+ to a+a+, which would make
grep a+a+a+ mistakenly match "aa".
[Bug#44351 introduced in grep 3.2]
grep -P now reports the troublesome input filename upon PCRE execution
failure. Before, searching many files for something rare might fail with
just "exceeded PCRE's backtracking limit". Now, it also reports which file
triggered the failure.
* Noteworthy changes in release 3.5 (2020-09-27) [stable]
** Changes in behavior
The message that a binary file matches is now sent to standard error
and the message has been reworded from "Binary file FOO matches" to
"grep: FOO: binary file matches", to avoid confusion with ordinary
output or when file names contain spaces and the like, and to be
more consistent with other diagnostics. For example, commands
like 'grep PATTERN FILE | wc' no longer add 1 to the count of
matching text lines due to the presence of the message. Like other
stderr messages, the message is now omitted if the --no-messages
(-s) option is given.
Two other stderr messages now use the typical form too. They are
now "grep: FOO: warning: recursive directory loop" and "grep: FOO:
input file is also the output".
The --files-without-match (-L) option has reverted to its behavior
in grep 3.1 and earlier. That is, grep -L again succeeds when a
line is selected, not when a file is listed. The behavior in grep
3.2 through 3.4 was causing compatibility problems.
** Bug fixes
grep -I no longer issues a spurious "Binary file FOO matches" line.
[Bug#33552 introduced in grep 2.23]
In UTF-8 locales, grep -w no longer ignores a multibyte word
constituent just before what would otherwise be a word match.
[Bug#43225 introduced in grep 2.28]
grep -i no longer mishandles ASCII characters that match multibyte
characters. For example, 'LC_ALL=tr_TR.utf8 grep -i i' no longer
dumps core merely because 'i' matches 'İ' (U+0130 LATIN CAPITAL
LETTER I WITH DOT ABOVE) in Turkish when ignoring case.
[Bug#43577 introduced partly in grep 2.28 and partly in grep 3.4]
A performance regression with -E and many patterns has been mostly fixed.
"Mostly" as there is a performance tradeoff between Bug#22357 and Bug#40634.
[Bug#40634 introduced in grep 2.28]
A performance regression with many duplicate patterns has been fixed.
[Bug#43040 introduced in grep 3.4]
An N^2 RSS performance regression with many patterns has been fixed
in common cases (no backref, and no use of -o or --color).
With only 80,000 lines of /usr/share/dict/linux.words, the following
would use 100GB of RSS and take 3 minutes. With the fix, it used less
than 400MB and took less than one second:
head -80000 /usr/share/dict/linux.words > w; grep -vf w w
[Bug#43527 introduced in grep 3.4]
** Build-related
"make dist" builds .tar.gz files again, as they are still used in
some barebones builds.
* Noteworthy changes in release 3.4 (2020-01-02) [stable]
** New features
The new --no-ignore-case option causes grep to observe case
distinctions, overriding any previous -i (--ignore-case) option.
** Bug fixes
'.' no longer matches some invalid byte sequences in UTF-8 locales.
[bug introduced in grep 2.7]
grep -Fw can no longer false match in non-UTF-8 multibyte locales
For example, this command would erroneously print its input line:
echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b
[Bug#38223 introduced in grep 2.28]
The exit status of 'grep -L' is no longer incorrect when standard
output is /dev/null.
[Bug#37716 introduced in grep 3.2]
A performance bug has been fixed when grep is given many patterns,
each with no back-reference.
[Bug#33249 introduced in grep 2.5]
A performance bug has been fixed for patterns like '01.2' that
cause grep to reorder tokens internally.
[Bug#34951 introduced in grep 3.2]
** Build-related
The build procedure no longer relies on any already-built src/grep
that might be absent or broken. Instead, it uses the system 'grep'
to bootstrap, and uses src/grep only to test the build. On Solaris
/usr/bin/grep is broken, but you can install GNU or XPG4 'grep' from
the standard Solaris distribution before building GNU Grep yourself.
[bug introduced in grep 2.8]
* Noteworthy changes in release 3.3 (2018-12-20) [stable]
** Bug fixes
@ -8,9 +277,9 @@ GNU grep NEWS -*- outline -*-
the following would print nothing (it should print the input line):
echo 123-x|LC_ALL=C grep '.\bx'
Using a multibyte locale, using certain regexp constructs (some ranges,
backreferences), or forcing use of the PCRE matcher via --perl-regexp (-P)
back-references), or forcing use of the PCRE matcher via --perl-regexp (-P)
would avoid the bug.
[bug introduced in grep 2.3]
[bug introduced in grep 3.2]
* Noteworthy changes in release 3.2 (2018-12-20) [stable]
@ -201,7 +470,7 @@ GNU grep NEWS -*- outline -*-
grep -z would match strings it should not. To trigger the bug, you'd
have to use a regular expression including an anchor (^ or $) and a
feature like a range or a backreference, causing grep to forego its DFA
feature like a range or a back-reference, causing grep to forego its DFA
matcher and resort to using re_search. With a multibyte locale, that
matcher could mistakenly match a string containing a newline.
For example, this command:
@ -434,7 +703,7 @@ GNU grep NEWS -*- outline -*-
Previously it was unreliable, and sometimes crashed or looped.
[bug introduced in grep-2.16]
grep -P now works with -w and -x and backreferences. Before,
grep -P now works with -w and -x and back-references. Before,
echo aa|grep -Pw '(.)\1' would fail to match, yet
echo aa|grep -Pw '(.)\2' would match.
@ -770,7 +1039,7 @@ GNU grep NEWS -*- outline -*-
X{0,0} is implemented correctly. It used to be a synonym of X{0,1}.
[bug present since "the beginning"]
In multibyte locales, regular expressions including backreferences
In multibyte locales, regular expressions including back-references
no longer exhibit quadratic complexity (i.e., they are orders
of magnitude faster). [bug present since multi-byte character set
support was introduced in 2.5.2]
@ -928,7 +1197,7 @@ Version 2.5
- The new option --line-buffered fflush on everyline. There is a noticeable
slow down when forcing line buffering.
- Back references are now local to the regex.
- Back-references are now local to the regex.
grep -e '\(a\)\1' -e '\(b\)\1'
The last backref \1 in the second expression refer to \(b\)
@ -1142,7 +1411,7 @@ necessary to track the evolution of the regex package, and since
I was changing it anyway I decided to do a general cleanup.
========================================================================
Copyright (C) 1992, 1997-2002, 2004-2018 Free Software Foundation, Inc.
Copyright (C) 1992, 1997-2002, 2004-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright

16
README
View File

@ -1,4 +1,4 @@
Copyright (C) 1992, 1997-2002, 2004-2018 Free Software Foundation, Inc.
Copyright (C) 1992, 1997-2002, 2004-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
@ -12,13 +12,13 @@ GNU grep is provided "as is" with no warranty. The exact terms
under which you may use and (re)distribute this program are detailed
in the GNU General Public License, in the file COPYING.
GNU grep is based on a fast lazy-state deterministic matcher (about
twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
search for a fixed string that eliminates impossible text from being
considered by the full regexp matcher without necessarily having to
look at every character. The result is typically many times faster
than Unix grep or egrep. (Regular expressions containing backreferencing
will run more slowly, however.)
GNU grep is based on a fast lazy-state deterministic matcher
hybridized with Boyer-Moore and Aho-Corasick searches for fixed
strings that eliminate impossible text from being considered by the
full regexp matcher without necessarily having to look at every
character. The result is typically many times faster than traditional
implementations. (Regular expressions containing back-references will
run more slowly, however.)
See the files AUTHORS and THANKS for a list of authors and other contributors.

View File

@ -1,4 +1,4 @@
Copyright (C) 1992, 1997-2002, 2004-2018 Free Software Foundation, Inc.
Copyright (C) 1992, 1997-2002, 2004-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright

View File

@ -1,35 +1,47 @@
-*- outline -*-
Building from a Git repository -*- outline -*-
These notes intend to help people working on the checked-out sources.
These requirements do not apply when building from a distribution tarball.
If this package has a file HACKING, please also read that file for
more detailed contribution guidelines.
* Requirements
We've opted to keep only the highest-level sources in the GIT repository.
This eases our maintenance burden, (fewer merges etc.), but imposes more
We've opted to keep only the highest-level sources in the Git repository.
This eases our maintenance burden (fewer merges etc.), but imposes more
requirements on anyone wishing to build from the just-checked-out sources.
Note the requirements to build the released archive are much less and
are just the requirements of the standard ./configure && make procedure.
(The requirements to build from a release are much less and are just
the requirements of the standard './configure && make' procedure.)
Specific development tools and versions will be checked for and listed by
the bootstrap script. See README-prereq for specific notes on obtaining
these prerequisite tools.
Valgrind <http://valgrind.org/> is also highly recommended, if
Valgrind supports your architecture. See also README-valgrind.
Valgrind supports your architecture. See also README-valgrind
(if present).
While building from a just-cloned source tree may require installing a
few prerequisites, later, a plain 'git pull && make' should be sufficient.
few prerequisites, later, a plain 'git pull && make' typically suffices.
* First GIT checkout
* First Git checkout
You can get a copy of the source repository like this:
$ git clone git://git.sv.gnu.org/grep
$ cd grep
$ git clone https://git.savannah.gnu.org/git/<packagename>
$ cd <packagename>
As an optional step, if you already have a copy of the gnulib git
repository on your hard drive, then you can use it as a reference to
reduce download time and disk space requirements:
where '<packagename>' stands for 'coreutils' or whatever other package
you are building.
To use the most-recent Gnulib (as opposed to the Gnulib version that
the package last synchronized to), do this next:
$ git submodule foreach git pull origin master
$ git commit -m 'build: update gnulib submodule to latest' gnulib
As an optional step, if you already have a copy of the Gnulib Git
repository, then you can use it as a reference to reduce download
time and file system space requirements:
$ export GNULIB_SRCDIR=/path/to/gnulib
@ -38,20 +50,14 @@ which are extracted from other source packages:
$ ./bootstrap
To use the most-recent gnulib (as opposed to the gnulib version that
the package last synchronized to), do this next:
$ git submodule foreach git pull origin master
$ git commit -m 'build: update gnulib submodule to latest' gnulib
And there you are! Just
$ ./configure --quiet #[--enable-gcc-warnings] [*]
$ ./configure --quiet #[--disable-gcc-warnings] [*]
$ make
$ make check
At this point, there should be no difference between your local copy,
and the GIT master copy:
and the Git master copy:
$ git diff
@ -59,15 +65,43 @@ should output no difference.
Enjoy!
[*] The --enable-gcc-warnings option is useful only with glibc
and with a very recent version of gcc. You'll probably also have
to use recent system headers. If you configure with this option,
and spot a problem, please be sure to send the report to the bug
reporting address of this package, and not to that of gnulib, even
if the problem seems to originate in a gnulib-provided file.
[*] By default GCC warnings are enabled when building from Git.
If you get warnings with recent GCC and Glibc with default
configure-time options, please report the warnings to the bug
reporting address of this package instead of to bug-gnulib,
even if the problem seems to originate in a Gnulib-provided file.
If you get warnings with other configurations, you can run
'./configure --disable-gcc-warnings' or 'make WERROR_CFLAGS='
to build quietly or verbosely, respectively.
-----
Copyright (C) 2002-2018 Free Software Foundation, Inc.
* Submitting patches
If you develop a fix or a new feature, please send it to the
appropriate bug-reporting address as reported by the --help option of
each program. One way to do this is to use vc-dwim
<https://www.gnu.org/software/vc-dwim/>), as follows.
Run the command "vc-dwim --initialize" from the top-level directory
of this package's git-cloned hierarchy.
Edit the (empty) ChangeLog file that this command creates, creating a
properly-formatted entry according to the GNU coding standards
<https://www.gnu.org/prep/standards/html_node/Change-Logs.html>.
Make your changes.
Run the command "vc-dwim" and make sure its output (the diff of all
your changes) looks good.
Run "vc-dwim --commit".
Run the command "git format-patch --stdout -1", and email its output
in, using the output's subject line.
-----
Copyright (C) 2002-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@ -1,62 +1,41 @@
This gives some notes on obtaining the tools required for development.
I.E. the tools checked for by the bootstrap script and include:
These tools can be used by the 'bootstrap' and 'configure' scripts,
as well as by 'make'. They include:
- Autoconf <https://www.gnu.org/software/autoconf/>
- Automake <https://www.gnu.org/software/automake/>
- Bison <https://www.gnu.org/software/bison/>
- Gettext <https://www.gnu.org/software/gettext/>
- Git <https://git-scm.com/>
- Gperf <https://www.gnu.org/software/gperf/>
- Gzip <https://www.gnu.org/software/gzip/>
- Help2man <https://www.gnu.org/software/help2man/>
- M4 <https://www.gnu.org/software/m4/>
- Make <https://www.gnu.org/software/make/>
- Perl <https://www.cpan.org/>
- Pkg-config <https://www.freedesktop.org/wiki/Software/pkg-config/>
- Rsync <https://rsync.samba.org/>
- Tar <https://www.gnu.org/software/tar/>
- Texinfo <https://www.gnu.org/software/texinfo/>
- Wget <https://www.gnu.org/software/wget/>
- XZ Utils <https://tukaani.org/xz/>
Note please try to install/build official packages for your system.
If these programs are not available use the following instructions
to build them and install the results into a directory that you will
then use when building this package.
It is generally better to use official packages for your system.
If a package is not officially available you can build it from source
and install it into a directory that you can then use to build this
package. If some packages are available but are too old, install the
too-old versions first as they may be needed to build newer versions.
Even if the official version of a package for your system is too old,
please install it, as it may be required to build the newer versions.
The examples below install into $HOME/grep/deps/, so if you are
going to follow these instructions, first ensure that your $PATH is
set correctly by running this command:
Here is an example of how to build a program from source. This
example is for Autoconf; a similar approach should work for the other
developer prerequisites. This example assumes Autoconf 2.71; it
should be OK to use a later version of Autoconf, if available.
prefix=$HOME/grep/deps
prefix=$HOME/prefix # (or wherever else you choose)
export PATH=$prefix/bin:$PATH
* autoconf *
# Note Autoconf 2.62 or newer is needed to build automake-1.11.1
git clone --depth=1 git://git.sv.gnu.org/autoconf.git
git checkout v2.62
autoreconf -vi
wget https://ftp.gnu.org/pub/gnu/autoconf/autoconf-2.71.tar.gz
gzip -d <autoconf-2.71.tar.gz | tar xf -
cd autoconf-2.71
./configure --prefix=$prefix
make install
* automake *
# Note help2man is required to build automake fully
git clone git://git.sv.gnu.org/automake.git
cd automake
git checkout v1.11.1
./bootstrap
./configure --prefix=$prefix
make install
This package uses XZ utils (successor to LZMA) to create
a compressed distribution tarball. Using this feature of Automake
requires version 1.10a or newer, as well as the xz program itself.
* xz *
git clone git://ctrl.tukaani.org/xz.git
cd xz
./autogen.sh
./configure --prefix=$prefix
make install
Now you can build this package as described in README-hacking.
Once the prerequisites are installed, you can build this package as
described in README-hacking.

View File

@ -13,6 +13,7 @@ end of e.g., grep --help).
Akim Demaille akim@epita.fr
Andreas Schwab schwab@suse.de
Andreas Ley andy@rz.uni-karlsruhe.de
Anton Samokat samokat700@gmail.com
Bastiaan "Darquan" Stougie darquan@zonnet.nl
Ben Elliston bje@cygnus.com
Bernd Strieder strieder@student.uni-kl.de
@ -28,6 +29,7 @@ David J MacKenzie djm@catapult.va.pubnix.com
David O'Brien obrien@freebsd.org
'Drake' Daham Wang drakewang@gmail.com
Egmont Koblinger egmont@gmail.com
Emanuele Torre torreemanuele6@gmail.com
Fernando Basso fernandobasso.br@gmail.com
Florian La Roche laroche@redhat.com
François Pinard pinard@iro.umontreal.ca
@ -35,6 +37,7 @@ Gerald Stoller gerald_stoller@hotmail.com
Grant McDorman grant@isgtec.com
Greg Boyd gboyd.ccsf@gmail.com
Greg Louis glouis@dynamicro.on.ca
Gro-Tsen https://twitter.com/gro_tsen
Guglielmo 'bond' Bondioni g.bondioni@libero.it
H. Merijn Brand h.m.brand@hccnet.nl
Harald Hanche-Olsen hanche@math.ntnu.no
@ -50,9 +53,11 @@ Joel N. Weber II devnull@gnu.org
John Hughes john@nitelite.calvacom.fr
Jorge Stolfi stolfi@dcc.unicamp.br
Karl Heuer kwzh@gnu.org
Karl Pettersson karl.pettersson@klpn.se
Kaveh R. Ghazi ghazi@caip.rutgers.edu
Kazuro Furukawa furukawa@apricot.kek.jp
Keith Bostic bostic@bsdi.com
Koen Claessen koen@chalmers.se
Krishna Sethuraman krishna@sgihub.corp.sgi.com
Kurt D Schwehr kdschweh@insci14.ucsd.edu
Ludovic Courtès ludo@gnu.org
@ -77,6 +82,7 @@ Rainer Orth ro@cebitec.uni-bielefeld.de
Roland Roberts rroberts@muller.com
Ruslan Ermilov ru@freebsd.org
Santiago Vila sanvila@unex.es
Sebastian Carlos sebaaa1754@gmail.com
Shannon Hill hill@synnet.com
Sotiris Vassilopoulos Sotiris.Vassilopoulos@betatech.gr
Standish Parsley adsspamtrap01@yahoo.com

6
TODO
View File

@ -1,6 +1,6 @@
Things to do for GNU grep
Copyright (C) 1992, 1997-2002, 2004-2018 Free Software Foundation, Inc.
Copyright (C) 1992, 1997-2002, 2004-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
@ -31,13 +31,13 @@ GNU grep originally did 32-bit arithmetic. Although it has moved to
64-bit on 64-bit platforms by using types like ptrdiff_t and size_t,
this conversion has not been entirely systematic and should be checked.
Lazy dynamic linking of libpcre. See Debians 03-397262-dlopen-pcre.patch.
Lazy dynamic linking of the PCRE library.
Check FreeBSDs integration of zgrep (-Z) and bzgrep (-J) in one
binary. Is there a possibility of doing even better by automatically
checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with
libpcre is decided, do the same for libz and libbz2.
the PCRE library is decided, do the same for libz and libbz2.
===================

1730
bootstrap

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
# Bootstrap configuration.
# Copyright (C) 2006-2018 Free Software Foundation, Inc.
# Copyright (C) 2006-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -17,25 +17,31 @@
avoided_gnulib_modules='
--avoid=lock-tests
--avoid=mbuiter
--avoid=mbuiterf
--avoid=mbrlen-tests
--avoid=mbrtowc-tests
--avoid=update-copyright-tests
'
# gnulib modules used by this package.
gnulib_modules='
alloca
announce-gen
argmatch
assert-h
c-ctype
c-stack
c-strcasecmp
c32isalnum
c32rtomb
closeout
configmake
dfa
dirname-lgpl
do-release-commit-and-tag
error
exclude
fcntl-h
fdl
fnmatch
fstatat
fts
@ -47,58 +53,59 @@ git-version-gen
gitlog-to-changelog
gnu-web-doc-update
gnupload
hash
idx
ignore-value
intprops
inttypes
inttypes-h
isatty
isblank
iswctype
kwset
largefile
locale
locale-h
lseek
maintainer-makefile
malloc-gnu
manywarnings
mbrlen
mbrtowc
mbrtoc32-regular
mbszero
mcel-prefer
memchr
memchr2
mempcpy
minmax
nullptr
obstack
openat-safer
perl
propername
quote
rawmemchr
readme-release
realloc-gnu
realloc-posix
regex
safe-read
same-inode
ssize_t
stddef
stdlib
stdckdint-h
stddef-h
stdlib-h
stpcpy
strerror
string
string-h
strstr
strtoull
strtoumax
sys_stat
unistd
sys_stat-h
unistd-h
unlocked-io
update-copyright
useless-if-before-free
verify
version-etc-fsf
wchar
wcrtomb
wctob
wctype-h
wchar-single
windows-stat-inodes
xalloc
xbinary-io
xstrtoimax
year2038
'
gnulib_name=libgreputils
@ -126,13 +133,16 @@ gnulib_tool_option_extras="--tests-base=gnulib-tests --with-tests --symlink\
buildreq="\
autoconf 2.62
automake 1.11.1
autopoint -
autopoint 0.19.2
gettext -
git 1.4.4
gzip -
m4 -
makeinfo -
rsync -
tar -
texi2pdf 6.1
wget -
xz -
"
bootstrap_post_import_hook ()
@ -140,22 +150,27 @@ bootstrap_post_import_hook ()
# Automake requires that ChangeLog exist.
touch ChangeLog || return 1
# Copy tests/init.sh from Gnulib.
$gnulib_tool --copy-file tests/init.sh
# Copy pkg-config's pkg.m4 so that our downstream users don't need to.
local ac_dir=`aclocal --print-ac-dir`
test -s "$ac_dir/dirlist" && ac_dir=$ac_dir:`tr '\n' : < "$ac_dir/dirlist"`
oIFS=$IFS
IFS=:
local found=false
for dir in \
$ACLOCAL_PATH $ac_dir /usr/share/aclocal ''
do
IFS=$oIFS
if test -n "$dir" && test -r "$dir/pkg.m4"; then
cp "$dir/pkg.m4" m4/pkg.m4
return
found=:
break
fi
done
IFS=$oIFS
die 'Cannot find pkg.m4; perhaps you need to install pkg-config'
$found || die 'Cannot find pkg.m4; perhaps you need to install pkg-config'
}
bootstrap_epilogue()

39
cfg.mk
View File

@ -1,5 +1,5 @@
# Customize maint.mk -*- makefile -*-
# Copyright (C) 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2009-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -30,7 +30,9 @@ url_dir_list = https://ftp.gnu.org/gnu/$(PACKAGE)
# Tests not to run as part of "make distcheck".
local-checks-to-skip = \
sc_texinfo_acronym
sc_indent \
sc_texinfo_acronym \
sc_unportable_grep_q
# Tools used to bootstrap this package, used for "announcement".
bootstrap-tools = autoconf,automake,gnulib
@ -40,7 +42,14 @@ announcement_Cc_ = $(translation_project_), $(PACKAGE)-devel@gnu.org
# The tight_scope test gets confused about inline functions.
# like 'to_uchar'.
_gl_TS_unmarked_extern_functions = main usage mb_clen to_uchar dfaerror dfawarn
_gl_TS_unmarked_extern_functions = \
main usage mb_clen to_uchar dfaerror dfawarn imbrlen
# Write base64-encoded (not hex) checksums into the announcement.
announce_gen_args = --cksum-checksums
# Add an exemption for sc_makefile_at_at_check.
_makefile_at_at_check_exceptions = ' && !/MAKEINFO/'
# Now that we have better tests, make this the default.
export VERBOSE = yes
@ -65,7 +74,13 @@ export VERBOSE = yes
# 1127556 9e
export XZ_OPT = -6e
old_NEWS_hash = 7623f45d6e457629257ff9a9f8237673
old_NEWS_hash = 3713245f672c3a9d1b455d6cc410c9ec
# We prefer to spell it back-reference, as POSIX does.
sc_prohibit_backref:
@prohibit=back''reference \
halt='spell it "back-reference"' \
$(_sc_search_regexp)
# Many m4 macros names once began with 'jm_'.
# Make sure that none are inadvertently reintroduced.
@ -89,6 +104,7 @@ LINE_LEN_MAX = 80
FILTER_LONG_LINES = \
/^[^:]*\.diff:[^:]*:@@ / d; \
\|^[^:]*TODO:| d; \
\|^[^:]*doc/fdl.texi:| d; \
\|^[^:]*man/help2man:| d; \
\|^[^:]*tests/misc/sha[0-9]*sum.*\.pl[-:]| d; \
\|^[^:]*tests/pr/|{ \|^[^:]*tests/pr/pr-tests:| !d; };
@ -159,3 +175,18 @@ exclude_file_name_regexp--sc_prohibit_tab_based_indentation = \
exclude_file_name_regexp--sc_prohibit_doubled_word = ^tests/count-newline$$
exclude_file_name_regexp--sc_long_lines = ^tests/.*$$
# If a test uses timeout, it must also use require_timeout_.
# Grandfather-exempt the fedora test, since it ensures timeout works
# as expected before using it.
sc_timeout_prereq:
@$(VC_LIST_EXCEPT) \
| grep '^tests/' \
| grep -v '^tests/fedora$$' \
| xargs grep -lw timeout \
| xargs grep -FLw require_timeout_ \
| $(GREP) . \
&& { echo '$(ME): timeout without use of require_timeout_' \
1>&2; exit 1; } || :
codespell_ignore_words_list = clen,allo,Nd,abd,alph,debbugs,wee,UE,ois,creche

View File

@ -1,7 +1,7 @@
dnl
dnl autoconf input file for GNU grep
dnl
dnl Copyright (C) 1997-2006, 2009-2018 Free Software Foundation, Inc.
dnl Copyright (C) 1997-2006, 2009-2026 Free Software Foundation, Inc.
dnl
dnl This file is part of GNU grep.
dnl
@ -22,54 +22,23 @@ AC_INIT([GNU grep],
m4_esyscmd([build-aux/git-version-gen .tarball-version]),
[bug-grep@gnu.org])
# Set the GREP and EGREP variables to a dummy replacement for the 'grep'
# command, so that AC_PROG_GREP and AC_PROG_EGREP don't fail when no good
# 'grep' program is found. This makes it possible to build GNU grep on a
# Solaris machine that has only /usr/bin/grep and no /usr/xpg4/bin/grep.
# This function supports only restricted arguments:
# - No file names as arguments, process only standard input.
# - Only literal strings without backslashes, no regular expressions.
# - The only options are -e and -E (and -Ee).
# This function also does not support long lines beyond what the shell
# supports), and backslash-processes the input.
fn_grep () {
test "$1" = -E && shift
case $@%:@:$1 in
0:*) AC_MSG_ERROR([fn_grep: expected pattern]) ;;
1:-*) AC_MSG_ERROR([fn_grep: invalid command line]) ;;
1:*) pattern=$1 ;;
2:--|2:-e|2:-Ee) pattern=$2 ;;
*) AC_MSG_ERROR([fn_grep: invalid command line]) ;;
esac
if test -n "$GREP" || test -n "$EGREP"; then
AC_MSG_ERROR(
[no working 'grep' found
A working 'grep' command is needed to build GNU Grep.
This 'grep' should support -e and long lines.
On Solaris 10, install the package SUNWggrp or SUNWxcu4.
On Solaris 11, install the package text/gnu-grep or system/xopen/xcu4.])
fi
case $pattern in
[*['].^$\*[']*]) dnl The outer brackets are for M4.
AC_MSG_ERROR([fn_grep: regular expressions not supported]) ;;
esac
rc=1
while read line; do
case $line in
*$pattern*)
rc=0
AS_ECHO([$line]) ;;
esac
done
return $rc
}
test -n "$GREP" || GREP=fn_grep
test -n "$EGREP" || EGREP=fn_grep
ac_cv_path_EGREP=$EGREP
AC_CONFIG_AUX_DIR(build-aux)
AC_CONFIG_SRCDIR(src/grep.c)
AC_CONFIG_AUX_DIR([build-aux])
AC_CONFIG_SRCDIR([src/grep.c])
AC_DEFINE([GREP], 1, [We are building grep])
AC_PREREQ([2.63])
AC_PREREQ([2.64])
AC_CONFIG_MACRO_DIRS([m4])
dnl Automake stuff.
AM_INIT_AUTOMAKE([1.11 no-dist-gzip dist-xz color-tests parallel-tests
AM_INIT_AUTOMAKE([1.11 dist-xz color-tests parallel-tests
subdir-objects])
AM_SILENT_RULES([yes]) # make --enable-silent-rules the default.
@ -82,74 +51,94 @@ AC_PROG_INSTALL
AC_PROG_CC
gl_EARLY
AC_PROG_RANLIB
PKG_PROG_PKG_CONFIG([0.9.0])
PKG_PROG_PKG_CONFIG([0.9.0], [PKG_CONFIG=false])
# grep never invokes mbrtowc or mbrlen on empty input,
# so don't worry about this common bug,
# as working around it would merely slow grep down.
gl_cv_func_mbrtowc_empty_input='assume yes'
gl_cv_func_mbrlen_empty_input='assume yes'
dnl Checks for typedefs, structures, and compiler characteristics.
AC_TYPE_SIZE_T
AC_C_CONST
gl_INIT
# Ensure VLAs are not used.
# Note -Wvla is implicitly added by gl_MANYWARN_ALL_GCC
AC_DEFINE([GNULIB_NO_VLA], [1], [Define to 1 to disable use of VLAs])
# The test suite needs to know if we have a working perl.
# FIXME: this is suboptimal. Ideally, we would be able to call gl_PERL
# with an ACTION-IF-NOT-FOUND argument ...
cu_have_perl=yes
case $PERL in *"/missing "*) cu_have_perl=no;; esac
AM_CONDITIONAL([HAVE_PERL], [test $cu_have_perl = yes])
AM_CONDITIONAL([HAVE_PERL], [test "$gl_cv_prog_perl" != no])
# gl_GCC_VERSION_IFELSE([major], [minor], [run-if-found], [run-if-not-found])
# ------------------------------------------------
# If $CPP is gcc-MAJOR.MINOR or newer, then run RUN-IF-FOUND.
# Otherwise, run RUN-IF-NOT-FOUND.
AC_DEFUN([gl_GCC_VERSION_IFELSE],
[AC_PREPROC_IFELSE(
[AC_LANG_PROGRAM(
[[
#if ($1) < __GNUC__ || (($1) == __GNUC__ && ($2) <= __GNUC_MINOR__)
/* ok */
#else
# error "your version of gcc is older than $1.$2"
#endif
]]),
], [$3], [$4])
]
)
AC_ARG_ENABLE([gcc-warnings],
[AS_HELP_STRING([--enable-gcc-warnings],
[turn on lots of GCC warnings (for developers)])],
[AS_HELP_STRING([--enable-gcc-warnings@<:@=TYPE@:>@],
[control generation of GCC warnings. The TYPE 'no' disables
warnings (default for non-developer builds); 'yes' generates
cheap warnings if available (default for developer builds);
'expensive' in addition generates expensive-to-compute warnings
if available.])],
[case $enableval in
yes|no) ;;
no|yes|expensive) ;;
*) AC_MSG_ERROR([bad value $enableval for gcc-warnings option]) ;;
esac
gl_gcc_warnings=$enableval],
[gl_gcc_warnings=no
if test "$GCC" = yes && test -d "$srcdir"/.git; then
AC_COMPILE_IFELSE(
[AC_LANG_PROGRAM([[
#if ! (6 < __GNUC__ + (2 <= __GNUC_MINOR__))
#error "--enable-gcc-warnings defaults to 'no' on older GCC"
#endif
]])],
[gl_gcc_warnings=yes])
fi]
[
# GCC provides fine-grained control over diagnostics which
# is used in gnulib for example to suppress warnings from
# certain sections of code. So if this is available and
# we're running from a git repo, then auto enable the warnings.
gl_gcc_warnings=no
gl_GCC_VERSION_IFELSE([4], [6],
[test -d "$srcdir"/.git \
&& ! test -f "$srcdir"/.tarball-version \
&& gl_gcc_warnings=yes])]
)
if test "$gl_gcc_warnings" = yes; then
if test $gl_gcc_warnings != no; then
gl_WARN_ADD([-Werror], [WERROR_CFLAGS])
AC_SUBST([WERROR_CFLAGS])
nw=
ew=
AS_IF([test $gl_gcc_warnings != expensive],
[# -fanalyzer and related options slow GCC considerably.
ew="$ew -fanalyzer -Wno-analyzer-double-free -Wno-analyzer-malloc-leak"
ew="$ew -Wno-analyzer-null-dereference -Wno-analyzer-use-after-free"])
nw=$ew
# This, $nw, is the list of warnings we disable.
nw="$nw -Wdeclaration-after-statement" # too useful to forbid
nw="$nw -Waggregate-return" # anachronistic
nw="$nw -Wlong-long" # C90 is anachronistic (lib/gethrxtime.h)
nw="$nw -Wc++-compat" # We don't care about C++ compilers
nw="$nw -Wundef" # Warns on '#if GNULIB_FOO' etc in gnulib
nw="$nw -Wvla" # suppress a warning in regexec.h
nw="$nw -Winline" # suppress warnings from streq.h's streq5
nw="$nw -Wsystem-headers" # Don't let system headers trigger warnings
nw="$nw -Wpadded" # Our structs are not padded
nw="$nw -Wvla" # warnings in gettext.h
nw="$nw -Wstack-protector" # generates false alarms for useful code
nw="$nw -Wswitch-default" # Too many warnings for now
nw="$nw -Wunsafe-loop-optimizations" # OK to suppress unsafe optimizations
nw="$nw -Winline" # streq.h's streq4, streq6 and strcaseeq6
nw="$nw -Wstrict-overflow" # regexec.c
gl_MANYWARN_ALL_GCC([ws])
gl_MANYWARN_COMPLEMENT([ws], [$ws], [$nw])
for w in $ws; do
gl_WARN_ADD([$w])
done
gl_WARN_ADD([-Wtrailing-whitespace]) # This project's coding style
gl_WARN_ADD([-Wno-missing-field-initializers]) # We need this one
gl_WARN_ADD([-Wno-sign-compare]) # Too many warnings for now
gl_WARN_ADD([-Wno-unused-parameter]) # Too many warnings for now
gl_WARN_ADD([-Wno-cast-function-type]) # sig-handler.h's sa_handler_t cast
gl_WARN_ADD([-Wno-deprecated-declarations]) # clang complains about sprintf
# In spite of excluding -Wlogical-op above, it is enabled, as of
# gcc 4.5.0 20090517, and it provokes warnings in cat.c, dd.c, truncate.c
@ -175,6 +164,32 @@ if test "$gl_gcc_warnings" = yes; then
gl_WARN_ADD([-Wno-format-nonliteral])
gl_MANYWARN_COMPLEMENT([GNULIB_WARN_CFLAGS], [$WARN_CFLAGS], [$nw])
AC_SUBST([GNULIB_WARN_CFLAGS])
# For gnulib-tests, the set is slightly smaller still.
# It's not worth being this picky about test programs.
nw=
nw="$nw -Wformat-truncation=2" # False alarm in strerror_r.c
nw="$nw -Wmissing-declarations"
nw="$nw -Wmissing-prototypes"
nw="$nw -Wmissing-variable-declarations"
nw="$nw -Wnull-dereference"
nw="$nw -Wold-style-definition"
nw="$nw -Wstrict-prototypes"
nw="$nw -Wsuggest-attribute=cold"
nw="$nw -Wsuggest-attribute=const"
nw="$nw -Wsuggest-attribute=format"
nw="$nw -Wsuggest-attribute=pure"
# Disable to avoid warnings in e.g., test-intprops.c and test-limits-h.c
# due to overlong expansions like this:
# test-intprops.c:147:5: error: string literal of length 9531 exceeds \
# maximum length 4095 that ISO C99 compilers are required to support
nw="$nw -Woverlength-strings"
gl_MANYWARN_COMPLEMENT([GNULIB_TEST_WARN_CFLAGS],
[$GNULIB_WARN_CFLAGS], [$nw])
gl_WARN_ADD([-Wno-return-type], [GNULIB_TEST_WARN_CFLAGS])
AC_SUBST([GNULIB_TEST_WARN_CFLAGS])
fi
# By default, argmatch should fail calling usage (EXIT_FAILURE).
@ -183,14 +198,7 @@ AC_DEFINE([ARGMATCH_DIE], [usage (EXIT_FAILURE)],
AC_DEFINE([ARGMATCH_DIE_DECL], [void usage (int _e)],
[Define to the declaration of the xargmatch failure function.])
dnl Checks for header files.
AC_HEADER_STDC
AC_HEADER_DIRENT
dnl Checks for functions.
AC_FUNC_CLOSEDIR_VOID
AC_CHECK_FUNCS_ONCE(isascii setlocale)
AC_CHECK_FUNCS_ONCE([setlocale])
dnl I18N feature
AM_GNU_GETTEXT_VERSION([0.18.2])
@ -203,9 +211,12 @@ dnl then the installer should configure --with-included-regex.
AM_CONDITIONAL([USE_INCLUDED_REGEX], [test "$ac_use_included_regex" = yes])
if test "$ac_use_included_regex" = no; then
AC_MSG_WARN([Included lib/regex.c not used])
else
AC_DEFINE([USE_INCLUDED_REGEX], 1, [building with included regex code])
fi
gl_FUNC_PCRE
AM_CONDITIONAL([USE_PCRE], [test $use_pcre = yes])
case $host_os in
mingw*) suffix=w32 ;;
@ -223,6 +234,4 @@ AC_CONFIG_FILES([
doc/Makefile
gnulib-tests/Makefile
])
GREP="$ac_abs_top_builddir/src/grep"
EGREP="$ac_abs_top_builddir/src/grep -E"
AC_OUTPUT

3
doc/.gitignore vendored
View File

@ -1,6 +1,3 @@
/egrep.1
/fdl.texi
/fgrep.1
/gendocs_template
/gendocs_template_min
/grep.info*

View File

@ -1,7 +1,7 @@
# Process this file with automake to create Makefile.in
# Makefile.am for grep/doc.
#
# Copyright 2008-2018 Free Software Foundation, Inc.
# Copyright 2008-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -16,23 +16,20 @@
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# The customization variable CHECK_NORMAL_MENU_STRUCTURE is necessary with
# makeinfo versions ≥ 6.8.
MAKEINFO = @MAKEINFO@ -c CHECK_NORMAL_MENU_STRUCTURE=1
info_TEXINFOS = grep.texi
grep_TEXINFOS = fdl.texi
man_MANS = grep.1 fgrep.1 egrep.1
man_MANS = grep.1
EXTRA_DIST = grep.in.1
CLEANFILES = grep.1 egrep.1 fgrep.1
CLEANFILES = grep.1
grep.1: grep.in.1
$(AM_V_GEN)rm -f $@-t $@
$(AM_V_at)sed 's/@''VERSION@/$(VERSION)/' $(srcdir)/grep.in.1 > $@-t
$(AM_V_at)chmod a=r $@-t
$(AM_V_at)mv -f $@-t $@
egrep.1 fgrep.1: Makefile.am
$(AM_V_GEN)rm -f $@-t $@
$(AM_V_at)inst=`echo grep | sed '$(transform)'`.1 \
&& echo ".so man1/$$inst" > $@-t
$(AM_V_at)chmod a=r $@-t
$(AM_V_at)mv -f $@-t $@

506
doc/fdl.texi Normal file
View File

@ -0,0 +1,506 @@
@c The GNU Free Documentation License.
@center Version 1.3, 3 November 2008
@c This file is intended to be included within another document,
@c hence no sectioning command or @node.
@display
Copyright @copyright{} 2000--2002, 2007--2008, 2023--2026 Free Software
Foundation, Inc.
@uref{https://fsf.org/}
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
@end display
@enumerate 0
@item
PREAMBLE
The purpose of this License is to make a manual, textbook, or other
functional and useful document @dfn{free} in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.
This License is a kind of ``copyleft'', which means that derivative
works of the document must themselves be free in the same sense. It
complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does. But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.
@item
APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License. Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein. The ``Document'', below,
refers to any such manual or work. Any member of the public is a
licensee, and is addressed as ``you''. You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.
A ``Modified Version'' of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A ``Secondary Section'' is a named appendix or a front-matter section
of the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could fall
directly within that overall subject. (Thus, if the Document is in
part a textbook of mathematics, a Secondary Section may not explain
any mathematics.) The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.
The ``Invariant Sections'' are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License. If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant. The Document may contain zero
Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.
The ``Cover Texts'' are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License. A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.
A ``Transparent'' copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters. A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text. A copy that is not ``Transparent'' is called ``Opaque''.
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, La@TeX{} input
format, SGML or XML using a publicly available
DTD, and standard-conforming simple HTML,
PostScript or PDF designed for human modification. Examples
of transparent image formats include PNG, XCF and
JPG@. Opaque formats include proprietary formats that can be
read and edited only by proprietary word processors, SGML or
XML for which the DTD and/or processing tools are
not generally available, and the machine-generated HTML,
PostScript or PDF produced by some word processors for
output purposes only.
The ``Title Page'' means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, ``Title Page'' means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.
The ``publisher'' means any person or entity that distributes copies
of the Document to the public.
A section ``Entitled XYZ'' means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language. (Here XYZ stands for a
specific section name mentioned below, such as ``Acknowledgements'',
``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
of such a section when you modify the Document means that it remains a
section ``Entitled XYZ'' according to this definition.
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.
@item
VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and
you may publicly display copies.
@item
COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover. Both covers must also clearly and legibly identify
you as the publisher of these copies. The front cover must present
the full title with all words of the title equally prominent and
visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.
If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.
It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.
@item
MODIFICATIONS
You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it. In addition, you must do these things in the Modified Version:
@enumerate A
@item
Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions
(which should, if there were any, be listed in the History section
of the Document). You may use the same title as a previous version
if the original publisher of that version gives permission.
@item
List on the Title Page, as authors, one or more persons or entities
responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the
Document (all of its principal authors, if it has fewer than five),
unless they release you from this requirement.
@item
State on the Title page the name of the publisher of the
Modified Version, as the publisher.
@item
Preserve all the copyright notices of the Document.
@item
Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
@item
Include, immediately after the copyright notices, a license notice
giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.
@item
Preserve in that license notice the full lists of Invariant Sections
and required Cover Texts given in the Document's license notice.
@item
Include an unaltered copy of this License.
@item
Preserve the section Entitled ``History'', Preserve its Title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If
there is no section Entitled ``History'' in the Document, create one
stating the title, year, authors, and publisher of the Document as
given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.
@item
Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions
it was based on. These may be placed in the ``History'' section.
You may omit a network location for a work that was published at
least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.
@item
For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
the Title of the section, and preserve in the section all the
substance and tone of each of the contributor acknowledgements and/or
dedications given therein.
@item
Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section titles.
@item
Delete any section Entitled ``Endorsements''. Such a section
may not be included in the Modified Version.
@item
Do not retitle any existing section to be Entitled ``Endorsements'' or
to conflict in title with any Invariant Section.
@item
Preserve any Warranty Disclaimers.
@end enumerate
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant. To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.
You may add a section Entitled ``Endorsements'', provided it contains
nothing but endorsements of your Modified Version by various
parties---for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
@item
COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled ``History''
in the various original documents, forming one section Entitled
``History''; likewise combine any sections Entitled ``Acknowledgements'',
and any sections Entitled ``Dedications''. You must delete all
sections Entitled ``Endorsements.''
@item
COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.
@item
AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an ``aggregate'' if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.
@item
TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers. In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.
If a section in the Document is Entitled ``Acknowledgements'',
``Dedications'', or ``History'', the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.
@item
TERMINATION
You may not copy, modify, sublicense, or distribute the Document
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense, or distribute it is void, and
will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license
from a particular copyright holder is reinstated (a) provisionally,
unless and until the copyright holder explicitly and finally
terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to
60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, receipt of a copy of some or all of the same material does
not give you any rights to use it.
@item
FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
@uref{https://www.gnu.org/licenses/}.
Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License ``or any later version'' applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation. If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation. If the Document
specifies that a proxy can decide which future versions of this
License can be used, that proxy's public statement of acceptance of a
version permanently authorizes you to choose that version for the
Document.
@item
RELICENSING
``Massive Multiauthor Collaboration Site'' (or ``MMC Site'') means any
World Wide Web server that publishes copyrightable works and also
provides prominent facilities for anybody to edit those works. A
public wiki that anybody can edit is an example of such a server. A
``Massive Multiauthor Collaboration'' (or ``MMC'') contained in the
site means any set of copyrightable works thus published on the MMC
site.
``CC-BY-SA'' means the Creative Commons Attribution-Share Alike 3.0
license published by Creative Commons Corporation, a not-for-profit
corporation with a principal place of business in San Francisco,
California, as well as future copyleft versions of that license
published by that same organization.
``Incorporate'' means to publish or republish a Document, in whole or
in part, as part of another Document.
An MMC is ``eligible for relicensing'' if it is licensed under this
License, and if all works that were first published under this License
somewhere other than this MMC, and subsequently incorporated in whole
or in part into the MMC, (1) had no cover texts or invariant sections,
and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site
under CC-BY-SA on the same site at any time before August 1, 2009,
provided the MMC is eligible for relicensing.
@end enumerate
@page
@heading ADDENDUM: How to use this License for your documents
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:
@smallexample
@group
Copyright (C) @var{year} @var{your name}.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
@end group
@end smallexample
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the ``with@dots{}Texts.''@: line with this:
@smallexample
@group
with the Invariant Sections being @var{list their titles}, with
the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
being @var{list}.
@end group
@end smallexample
If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.
@c Local Variables:
@c ispell-local-pdict: "ispell-dict"
@c End:

View File

@ -2,7 +2,7 @@
.de dT
.ds Dt \\$2
..
.dT Time-stamp: "2018-05-11"
.dT Time-stamp: "2025-03-21"
.\" Update the above date whenever a change to either this file or
.\" grep.c's 'usage' function results in a nontrivial change to the man page.
.\" In Emacs, you can update the date by running 'M-x time-stamp'
@ -11,8 +11,10 @@
.
.TH GREP 1 \*(Dt "GNU grep @VERSION@" "User Commands"
.
.if !\w|\*(lq| \{\
.\" groff an-old.tmac does not seem to be in use, so define lq and rq.
.ie \n(.g .ds ' \(aq
.el .ds ' '
.if !\w@\*(lq@ \{\
.\" The implementation lacks \*(lq and presumably \*(rq.
. ie \n(.g \{\
. ds lq \(lq\"
. ds rq \(rq\"
@ -23,152 +25,156 @@
. \}
.\}
.
.if !\w|\*(la| \{\
.as mC
.if !\w@\*(mC@ \{\
.\" groff an-ext.tmac does not seem to be in use, so define the parts of
.\" it that are used below. For a copy of groff an-ext.tmac, please see:
.\" https://git.savannah.gnu.org/cgit/groff.git/plain/tmac/an-ext.tmac
.\" --- Start of lines taken from groff an-ext.tmac
.\" it that are used below, taken from groff 1.23.0. For a copy, please see:
.\" https://git.savannah.gnu.org/cgit/groff.git/plain/tmac/an-ext.tmac?id=1.23.0
.nr mG \n(.g-1
.\" --- Start of lines taken from groff an-ext.tmac,
.\" except with "nr mH 14" replaced by "nr mH 0"
.\" and with mS, SY, YS definitions omitted.
.
.\" Check whether we are using grohtml.
.nr mH 0
.if \n(.g \
. if '\*(.T'html' \
. nr mH 1
.\" Define this to your implementation's constant-width typeface.
.ds mC CW
.if n .ds mC R
.
.\" Save the automatic hyphenation mode.
.\"
.\" In AT&T troff, there was no register exposing the hyphenation mode,
.\" and no way to save and restore it. Set `mH` to a reasonable value
.\" for your implementation and preference.
.de mY
. ie !\\n(.g \
. nr mH 0
. el \
. do nr mH \\n[.hy] \" groff extension register
..
.
.nr mE 0 \" in an example (EX/EE)?
.
.\" Prepare link text for mail/web hyperlinks. `MT` and `UR` call this.
.de mV
. ds m1 \\$1\"
..
.
.
.\" Map mono-width fonts to standard fonts for groff's TTY device.
.if n \{\
. do ftr CR R
. do ftr CI I
. do ftr CB B
.\}
.\" Emit hyperlink. The optional argument supplies trailing punctuation
.\" after link text. `ME` and `UE` call this.
.de mQ
. mY
. nh
<\\*(m1>\\$1
. hy \\n(mH
..
.
.\" groff has glyph entities for angle brackets.
.ie \n(.g \{\
. ds la \(la\"
. ds ra \(ra\"
.\}
.el \{\
. ds la <\"
. ds ra >\"
. \" groff's man macros control hyphenation with this register.
. nr HY 1
.\}
.
.\" Start URL.
.if \n(.g-\n(mG \{\
.de UR
. ds m1 \\$1\"
. nh
. if \\n(mH \{\
. \" Start diversion in a new environment.
. do ev URL-div
. do di URL-div
. \}
. mV \\$1
..
.\}
.
.
.\" End URL.
.if \n(.g-\n(mG \{\
.de UE
. ie \\n(mH \{\
. br
. di
. ev
.
. \" Has there been one or more input lines for the link text?
. ie \\n(dn \{\
. do HTML-NS "<a href=""\\*(m1"">"
. \" Yes, strip off final newline of diversion and emit it.
. do chop URL-div
. do URL-div
\c
. do HTML-NS </a>
. \}
. el \
. do HTML-NS "<a href=""\\*(m1"">\\*(m1</a>"
\&\\$*\"
. \}
. el \
\\*(la\\*(m1\\*(ra\\$*\"
.
. hy \\n(HY
. mQ \\$1
..
.\}
.
.
.\" Start email address.
.if \n(.g-\n(mG \{\
.de MT
. ds m1 \\$1\"
. nh
. if \\n(mH \{\
. \" Start diversion in a new environment.
. do ev URL-div
. do di URL-div
. \}
. mV \\$1
..
.\}
.
.
.\" End email address.
.if \n(.g-\n(mG \{\
.de ME
. ie \\n(mH \{\
. br
. di
. ev
.
. \" Has there been one or more input lines for the link text?
. ie \\n(dn \{\
. do HTML-NS "<a href=""mailto:\\*(m1"">"
. \" Yes, strip off final newline of diversion and emit it.
. do chop URL-div
. do URL-div
\c
. do HTML-NS </a>
. \}
. el \
. do HTML-NS "<a href=""mailto:\\*(m1"">\\*(m1</a>"
\&\\$*\"
. \}
. el \
\\*(la\\*(m1\\*(ra\\$*\"
.
. hy \\n(HY
. mQ \\$1
..
.\}
.
.
.\" Start example.
.if \n(.g-\n(mG \{\
.de EX
. br
. if !\\n(mE \{\
. nr mF \\n(.f
. nr mP \\n(PD
. nr PD 1v
. nf
. ft \\*(mC
. nr mE 1
. \}
..
.\}
.
.
.\" End example.
.if \n(.g-\n(mG \{\
.de EE
. br
. if \\n(mE \{\
. ft \\n(mF
. nr PD \\n(mP
. fi
. nr mE 0
. \}
..
.\}
.\" --- End of lines taken from groff an-ext.tmac
.\}
.
.hy 0
.
.SH NAME
grep, egrep, fgrep \- print lines that match patterns
grep \- print lines that match patterns
.
.SH SYNOPSIS
.B grep
.RI [ OPTION .\|.\|.]\&
.RI [ OPTION ].\|.\|.\&
.I PATTERNS
.RI [ FILE .\|.\|.]
.RI [ FILE ].\|.\|.
.br
.B grep
.RI [ OPTION .\|.\|.]\&
.RI [ OPTION ].\|.\|.\&
.B \-e
.I PATTERNS
\&.\|.\|.\&
.RI [ FILE .\|.\|.]
.RI [ FILE ].\|.\|.
.br
.B grep
.RI [ OPTION .\|.\|.]\&
.RI [ OPTION ].\|.\|.\&
.B \-f
.I PATTERN_FILE
\&.\|.\|.\&
.RI [ FILE .\|.\|.]
.RI [ FILE ].\|.\|.
.
.SH DESCRIPTION
.B grep
searches for
.I PATTERNS
in each
searches for patterns in each
.IR FILE .
In the synopsis's first form, which is used if no
.B \-e
or
.B \-f
options are present, the first operand
.I PATTERNS
is one or patterns separated by newline characters, and
is one or more patterns separated by newline characters, and
.B grep
prints each line that matches a pattern.
Typically
.I PATTERNS
should be quoted when
.B grep
is used in a shell command.
.PP
A
.I FILE
@ -179,17 +185,6 @@ If no
.I FILE
is given, recursive searches examine the working directory,
and nonrecursive searches read standard input.
.PP
In addition, the variant programs
.B egrep
and
.B fgrep
are the same as
.B "grep\ \-E"
and
.BR "grep\ \-F" ,
respectively.
These variants are deprecated, but are provided for backward compatibility.
.
.SH OPTIONS
.SS "Generic Program Information"
@ -201,7 +196,7 @@ Output a usage message and exit.
Output the version number of
.B grep
and exit.
.SS "Matcher Selection"
.SS "Pattern Syntax"
.TP
.BR \-E ", " \-\^\-extended\-regexp
Interpret
@ -220,7 +215,9 @@ as basic regular expressions (BREs, see below).
This is the default.
.TP
.BR \-P ", " \-\^\-perl\-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs).
Interpret
.I PATTERNS
as Perl-compatible regular expressions (PCREs).
This option is experimental when combined with the
.B \-z
.RB ( \-\^\-null\-data )
@ -248,11 +245,24 @@ If this option is used multiple times or is combined with the
.RB ( \-\^\-regexp )
option, search for all patterns given.
The empty file contains zero patterns, and therefore matches nothing.
If
.I FILE
is
.B \-
, read patterns from standard input.
.TP
.BR \-i ", " \-\^\-ignore\-case
Ignore case distinctions, so that characters that differ only in case
Ignore case distinctions in patterns and input data,
so that characters that differ only in case
match each other.
.TP
.B \-\^\-no\-ignore\-case
Do not ignore case distinctions in patterns and input data.
This is the default.
This option is useful for passing to shell scripts that already use
.BR \-i ,
to cancel its effects because the two options override each other.
.TP
.BR \-v ", " \-\^\-invert\-match
Invert the sense of matching, to select non-matching lines.
.TP
@ -275,10 +285,6 @@ pattern and then surrounding it with
.B ^
and
.BR $ .
.TP
.B \-y
Obsolete synonym for
.BR \-i .
.SS "General Output Control"
.TP
.BR \-c ", " \-\^\-count
@ -286,7 +292,7 @@ Suppress normal output; instead print a count of
matching lines for each input file.
With the
.BR \-v ", " \-\^\-invert\-match
option (see below), count non-matching lines.
option (see above), count non-matching lines.
.TP
.BR \-\^\-color [ =\fIWHEN\fP "], " \-\^\-colour [ =\fIWHEN\fP ]
Surround the matched (non-empty) strings, matching lines, context lines,
@ -295,9 +301,6 @@ groups of context lines) with escape sequences to display them in color
on the terminal.
The colors are defined by the environment variable
.BR GREP_COLORS .
The deprecated environment variable
.B GREP_COLOR
is still supported, but its setting does not have priority.
.I WHEN
is
.BR never ", " always ", or " auto .
@ -306,18 +309,27 @@ is
Suppress normal output; instead print the name
of each input file from which no output would
normally have been printed.
The scanning will stop on the first match.
.TP
.BR \-l ", " \-\^\-files\-with\-matches
Suppress normal output; instead print
the name of each input file from which output
would normally have been printed.
The scanning will stop on the first match.
Scanning each input file stops upon first match.
.TP
.BI \-m " NUM" "\fR,\fP \-\^\-max\-count=" NUM
Stop reading a file after
.I NUM
matching lines.
If
.I NUM
is zero,
.B grep
stops right away without reading input.
A
.I NUM
of \-1 is treated as infinity and
.B grep
does not stop; this is the default.
If the input is standard input from a regular file,
and
.I NUM
@ -380,6 +392,7 @@ print the offset of the matching part itself.
.BR \-H ", " \-\^\-with\-filename
Print the file name for each match.
This is the default when there is more than one file to search.
This is a GNU extension.
.TP
.BR \-h ", " \-\^\-no\-filename
Suppress the prefixing of file names on output.
@ -389,10 +402,10 @@ This is the default when there is only one file
.BI \-\^\-label= LABEL
Display input actually coming from standard input as input coming from file
.IR LABEL .
This is especially useful when implementing tools like
.BR zgrep ,
This can be useful for commands that transform a file's contents
before searching,
e.g.,
.BR "gzip \-cd foo.gz | grep \-\^\-label=foo \-H something" .
.BR "gzip \-cd foo.gz | grep \-\^\-label=foo \-H \*'some pattern\*'" .
See also the
.B \-H
option.
@ -413,20 +426,6 @@ from a single file will all start at the same column,
this also causes the line number and byte offset (if present)
to be printed in a minimum size field width.
.TP
.BR \-u ", " \-\^\-unix\-byte\-offsets
Report Unix-style byte offsets.
This switch causes
.B grep
to report byte offsets as if the file were a Unix-style text file,
i.e., with CR characters stripped off.
This will produce results identical to running
.B grep
on a Unix machine.
This option has no effect unless
.B \-b
option is also used;
it has no effect on platforms other than MS-DOS and MS-Windows.
.TP
.BR \-Z ", " \-\^\-null
Output a zero byte (the ASCII
.B NUL
@ -484,6 +483,26 @@ With the
or
.B \-\^\-only\-matching
option, this has no effect and a warning is given.
.TP
.BI \-\^\-group\-separator= SEP
When
.BR \-A ,
.BR \-B ,
or
.B \-C
are in use, print
.I SEP
instead of
.B \-\^\-
between groups of lines.
.TP
.B \-\^\-no\-group\-separator
When
.BR \-A ,
.BR \-B ,
or
.B \-C
are in use, do not print a separator between groups of lines.
.SS "File and Directory Selection"
.TP
.BR \-a ", " \-\^\-text
@ -505,11 +524,14 @@ By default,
.I TYPE
is
.BR binary ,
and when
and
.B grep
discovers that a file is binary it suppresses any further output, and
instead outputs either a one-line message saying that a binary file
matches, or no message if there is no match.
suppresses output after null input binary data is discovered,
and suppresses output lines that contain improperly encoded data.
When some output is suppressed,
.B grep
follows any output
with a message to standard error saying that a binary file matches.
.IP
If
.I TYPE
@ -517,7 +539,7 @@ is
.BR without\-match ,
when
.B grep
discovers that a file is binary it assumes that the rest of the file
discovers null input binary data it assumes that the rest of the file
does not match; this is equivalent to the
.B \-I
option.
@ -574,7 +596,7 @@ On the other hand, when reading files whose text encodings are
unknown, it can be helpful to use
.B \-a
or to set
.B LC_ALL='C'
.B LC_ALL=\*'C\*'
in the environment, in order to find more matches even if the matches
are unsafe for direct display.
.TP
@ -621,14 +643,13 @@ option.
Skip any command-line file with a name suffix that matches the pattern
.IR GLOB ,
using wildcard matching; a name suffix is either the whole
name, or any suffix starting after a
.B /
and before a
.RB non- / .
name, or a trailing part that starts with a non-slash character
immediately after a slash
.RB ( / )
in the name.
When searching recursively, skip any subfile whose base name matches
.IR GLOB ;
the base name is the part after the last
.BR / .
the base name is the part after the last slash.
A pattern can use
.BR * ,
.BR ? ,
@ -654,7 +675,7 @@ whose base name matches
Ignore any redundant trailing slashes in
.IR GLOB .
.TP
.BR \-I
.B \-I
Process a binary file as if it did not contain matching data; this is
equivalent to the
.B \-\^\-binary\-files=without\-match
@ -665,11 +686,24 @@ Search only files whose base name matches
.I GLOB
(using wildcard matching as described under
.BR \-\^\-exclude ).
If contradictory
.B \-\^\-include
and
.B \-\^\-exclude
options are given, the last matching one wins.
If no
.B \-\^\-include
or
.B \-\^\-exclude
options match, a file is included unless the first such option is
.BR \-\^\-include .
.TP
.BR \-r ", " \-\^\-recursive
Read all files under each directory, recursively,
following symbolic links only if they are on the command line.
Note that if no file operand is given, grep searches the working directory.
Note that if no file operand is given,
.B grep
searches the working directory.
This is equivalent to the
.B "\-d recurse"
option.
@ -680,19 +714,19 @@ Follow all symbolic links, unlike
.BR \-r .
.SS "Other Options"
.TP
.BR \-\^\-line\-buffered
.B \-\^\-line\-buffered
Use line buffering on output.
This can cause a performance penalty.
.TP
.BR \-U ", " \-\^\-binary
Treat the file(s) as binary.
By default, under MS-DOS and MS-Windows,
.BR grep
.B grep
guesses whether a file is text or binary as described for the
.B \-\^\-binary\-files
option.
If
.BR grep
.B grep
decides the file is a text file, it strips the CR characters from the
original file contents (to make regular expressions with
.B ^
@ -716,7 +750,7 @@ Like the
or
.B \-\^\-null
option, this option can be used with commands like
.B sort -z
.B "sort \-z"
to process arbitrary file names.
.
.SH "REGULAR EXPRESSIONS"
@ -728,15 +762,19 @@ expressions, by using various operators to combine smaller expressions.
understands three different versions of regular expression syntax:
\*(lqbasic\*(rq (BRE), \*(lqextended\*(rq (ERE) and \*(lqperl\*(rq (PCRE).
In GNU
.B grep
there is no difference in available functionality between basic and
extended syntaxes.
In other implementations, basic regular expressions are less powerful.
.BR grep ,
basic and extended regular expressions are merely different notations
for the same pattern-matching functionality.
In other implementations, basic regular expressions are ordinarily
less powerful than extended, though occasionally it is the other way around.
The following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and are
documented in pcresyntax(3) and pcrepattern(3), but work only if
PCRE is available in the system.
Perl-compatible regular expressions have different functionality, and are
documented in
.BR pcre2syntax (3)
and
.BR pcre2pattern (3),
but work only if PCRE support is enabled.
.PP
The fundamental building blocks are the regular expressions
that match a single character.
@ -771,19 +809,21 @@ matches any single digit.
Within a bracket expression, a
.I "range expression"
consists of two characters separated by a hyphen.
It matches any single character that sorts between the two characters,
inclusive, using the locale's collating sequence and character set.
For example, in the default C locale,
In the default C locale, it matches any single character that appears
between the two characters in ASCII order, inclusive.
For example,
.B [a\-d]
is equivalent to
.BR [abcd] .
Many locales sort characters in dictionary order, and in these locales
In other locales the behavior is unspecified:
.B [a\-d]
is typically not equivalent to
.BR [abcd] ;
it might be equivalent to
.BR [aBbCcDd] ,
for example.
might be equivalent to
.B [abcd]
or
.B [aBbCcDd]
or some other bracket expression,
or it might fail to match any character, or the set of
characters that it matches might be erratic, or it might be invalid.
To obtain the traditional interpretation of bracket expressions,
you can use the C locale by setting the
.B LC_ALL
@ -795,6 +835,7 @@ bracket expressions, as follows.
Their names are self explanatory, and they are
.BR [:alnum:] ,
.BR [:alpha:] ,
.BR [:blank:] ,
.BR [:cntrl:] ,
.BR [:digit:] ,
.BR [:graph:] ,
@ -905,7 +946,7 @@ Repetition takes precedence over concatenation, which in turn
takes precedence over alternation.
A whole expression may be enclosed in parentheses
to override these precedence rules and form a subexpression.
.SS "Back References and Subexpressions"
.SS "Back-references and Subexpressions"
The back-reference
.BI \e n\c
\&, where
@ -922,7 +963,7 @@ In basic regular expressions the meta-characters
.BR | ,
.BR ( ,
and
.BR )
.B )
lose their special meaning; instead use the backslashed
versions
.BR \e? ,
@ -933,7 +974,18 @@ versions
and
.BR \e) .
.
.SH "ENVIRONMENT VARIABLES"
.SH "EXIT STATUS"
Normally the exit status is 0 if a line is selected, 1 if no lines
were selected, and 2 if an error occurred. However, if the
.B \-q
or
.B \-\^\-quiet
or
.B \-\^\-silent
is used and a line is selected, the exit status is 0 even if an error
occurred.
.
.SH ENVIRONMENT
The behavior of
.B grep
is affected by the following environment variables.
@ -963,45 +1015,10 @@ The shell command
.B "locale \-a"
lists locales that are currently available.
.TP
.B GREP_OPTIONS
This variable specifies default options
to be placed in front of any explicit options.
As this causes problems when writing portable scripts,
this feature will be removed in a future release of
.BR grep ,
and
.B grep
warns if it is used.
Please use an alias or script instead.
.TP
.B GREP_COLOR
This variable specifies the color used to highlight matched (non-empty) text.
It is deprecated in favor of
.BR GREP_COLORS ,
but still supported.
The
.BR mt ,
.BR ms ,
and
.B mc
capabilities of
.B GREP_COLORS
have priority over it.
It can only specify the color used to highlight
the matching non-empty text in any matching line
(a selected line when the
.B \-v
command-line option is omitted,
or a context line when
.B \-v
is specified).
The default is
.BR 01;31 ,
which means a bold red foreground text on the terminal's default background.
.TP
.B GREP_COLORS
Specifies the colors and other attributes
used to highlight various parts of the output.
Controls how the
.B \-\^\-color
option highlights output.
Its value is a colon-separated list of capabilities
that defaults to
.B ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36
@ -1235,45 +1252,13 @@ front of the operand list and are treated as options.
Also, POSIX requires that unrecognized options be diagnosed as
\*(lqillegal\*(rq, but since they are not really against the law the default
is to diagnose them as \*(lqinvalid\*(rq.
.B POSIXLY_CORRECT
also disables \fB_\fP\fIN\fP\fB_GNU_nonoption_argv_flags_\fP,
described below.
.TP
\fB_\fP\fIN\fP\fB_GNU_nonoption_argv_flags_\fP
(Here
.I N
is
.BR grep 's
numeric process ID.) If the
.IR i th
character of this environment variable's value is
.BR 1 ,
do not consider the
.IR i th
operand of
.B grep
to be an option, even if it appears to be one.
A shell can put this variable in the environment for each command it runs,
specifying which operands are the results of file name wildcard
expansion and therefore should not be treated as options.
This behavior is available only with the GNU C library, and only
when
.B POSIXLY_CORRECT
is not set.
.
.SH "EXIT STATUS"
Normally the exit status is 0 if a line is selected, 1 if no lines
were selected, and 2 if an error occurred. However, if the
.B \-q
or
.B \-\^\-quiet
or
.B \-\^\-silent
is used and a line is selected, the exit status is 0 even if an error
occurred.
.SH NOTES
This man page is maintained only fitfully;
the full documentation is often more up-to-date.
.
.SH COPYRIGHT
Copyright 1998\(en2000, 2002, 2005\(en2018 Free Software Foundation, Inc.
Copyright 1998\(en2000, 2002, 2005\(en2026 Free Software Foundation, Inc.
.PP
This is free software;
see the source for copying conditions.
@ -1309,16 +1294,48 @@ to run out of memory.
.PP
Back-references are very slow, and may require exponential time.
.
.SH EXAMPLE
The following example outputs the location and contents of any line
containing \*(lqf\*(rq and ending in \*(lq.c\*(rq,
within all files in the current directory whose names
contain \*(lqg\*(rq and end in \*(lq.h\*(rq.
The
.B \-n
option outputs line numbers, the
.B \-\^\-
argument treats expansions of \*(lq*g*.h\*(rq starting with \*(lq\-\*(rq
as file names not options,
and the empty file /dev/null causes file names to be output
even if only one file name happens to be of the form \*(lq*g*.h\*(rq.
.PP
.in +2n
.EX
$ \fBgrep\fP \-n \-\^\- \*'f.*\e.c$\*' *g*.h /dev/null
argmatch.h:1:/* definitions and prototypes for argmatch.c
.EE
.in
.PP
The only line that matches is line 1 of argmatch.h.
Note that the regular expression syntax used in the pattern differs
from the globbing syntax that the shell uses to match file names.
.
.SH "SEE ALSO"
.SS "Regular Manual Pages"
awk(1), cmp(1), diff(1), find(1), gzip(1),
perl(1), sed(1), sort(1), xargs(1), zgrep(1),
read(2),
pcre(3), pcresyntax(3), pcrepattern(3),
terminfo(5),
glob(7), regex(7).
.SS "POSIX Programmer's Manual Page"
grep(1p).
.BR awk (1),
.BR cmp (1),
.BR diff (1),
.BR find (1),
.BR perl (1),
.BR sed (1),
.BR sort (1),
.BR xargs (1),
.BR read (2),
.BR pcre2 (3),
.BR pcre2syntax (3),
.BR pcre2pattern (3),
.BR terminfo (5),
.BR glob (7),
.BR regex (7)
.SS "Full Documentation"
A
.UR https://www.gnu.org/software/grep/manual/
@ -1335,9 +1352,6 @@ programs are properly installed at your site, the command
.PP
should give you access to the complete manual.
.
.SH NOTES
This man page is maintained only fitfully;
the full documentation is often more up-to-date.
.\" Work around problems with some troff -man implementations.
.br
.

File diff suppressed because it is too large Load Diff

2
gnulib

@ -1 +1 @@
Subproject commit 5d6a3cdd5c312e77a6d0f0848e3cb79a52e08658
Subproject commit 4f6ac2c3c689cd7312b5f9da97791b14bbc2ee53

View File

@ -1 +1,3 @@
AM_CFLAGS = $(GNULIB_TEST_WARN_CFLAGS) $(WERROR_CFLAGS)
include gnulib.mk

View File

@ -1,4 +1,4 @@
# Copyright 1997-1998, 2005-2018 Free Software Foundation, Inc.
# Copyright 1997-1998, 2005-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,5 +1,5 @@
/* Output colorization.
Copyright 2011-2018 Free Software Foundation, Inc.
Copyright 2011-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,9 +12,7 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Without this pragma, gcc 4.7.0 20120102 suggests that the
init_colorize function might be candidate for attribute 'const' */

View File

@ -1,5 +1,5 @@
/* Output colorization on MS-Windows.
Copyright 2011-2018 Free Software Foundation, Inc.
Copyright 2011-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,9 +12,7 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Written by Eli Zaretskii. */
@ -96,7 +94,7 @@ w32_sgr2attr (const char *sgr_seq)
{
if (*p == ';' || *p == '\0')
{
code = strtol (s, NULL, 10);
code = strtol (s, nullptr, 10);
s = p + (*p != '\0');
switch (code)

View File

@ -1,6 +1,6 @@
/* Output colorization.
Copyright 2011-2018 Free Software Foundation, Inc.
Copyright 2011-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
@ -12,9 +12,7 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
extern int should_colorize (void);
extern void init_colorize (void);

View File

@ -1,6 +1,6 @@
# pcre.m4 - check for libpcre support
# pcre.m4 - check for PCRE library support
# Copyright (C) 2010-2018 Free Software Foundation, Inc.
# Copyright (C) 2010-2026 Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
@ -8,8 +8,8 @@
AC_DEFUN([gl_FUNC_PCRE],
[
AC_ARG_ENABLE([perl-regexp],
AC_HELP_STRING([--disable-perl-regexp],
[disable perl-regexp (pcre) support]),
AS_HELP_STRING([--disable-perl-regexp],
[disable perl-regexp (PCRE) support]),
[case $enableval in
yes|no) test_pcre=$enableval;;
*) AC_MSG_ERROR([invalid value $enableval for --disable-perl-regexp]);;
@ -21,36 +21,54 @@ AC_DEFUN([gl_FUNC_PCRE],
use_pcre=no
if test $test_pcre != no; then
PKG_CHECK_MODULES([PCRE], [libpcre], [], [: ${PCRE_LIBS=-lpcre}])
AC_CACHE_CHECK([for pcre_compile], [pcre_cv_have_pcre_compile],
AS_CASE([${PCRE_CFLAGS+set}@${PCRE_LIBS+set}@$PKG_CONFIG],
[@@false], [],
[@@*], [PKG_CHECK_MODULES([PCRE], [libpcre2-8], [], [:])])
AC_CACHE_CHECK([for pcre2_compile], [pcre_cv_have_pcre2_compile],
[pcre_saved_CFLAGS=$CFLAGS
pcre_saved_LIBS=$LIBS
CFLAGS="$CFLAGS $PCRE_CFLAGS"
LIBS="$PCRE_LIBS $LIBS"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#include <pcre.h>
]],
[[pcre *p = pcre_compile (0, 0, 0, 0, 0);
return !p;]])],
[pcre_cv_have_pcre_compile=yes],
[pcre_cv_have_pcre_compile=no])
pcre_cv_have_pcre2_compile=no
while
CFLAGS="$pcre_saved_CFLAGS $PCRE_CFLAGS"
LIBS="$pcre_saved_LIBS $PCRE_LIBS"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
]],
[[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0);
return !p;]])],
[pcre_cv_have_pcre2_compile=yes])
test $pcre_cv_have_pcre2_compile = no
do
AS_CASE([$PCRE_CFLAGS@$PCRE_LIBS],
[@-lpcre2-8],
[# Even the fallback setting fails; give up.
PCRE_LIBS=
break])
# Fallback setting.
PCRE_CFLAGS=
PCRE_LIBS=-lpcre2-8
done
CFLAGS=$pcre_saved_CFLAGS
LIBS=$pcre_saved_LIBS])
if test "$pcre_cv_have_pcre_compile" = yes; then
if test "$pcre_cv_have_pcre2_compile" = yes; then
use_pcre=yes
elif test $test_pcre = maybe; then
AC_MSG_WARN([AC_PACKAGE_NAME will be built without pcre support.])
AC_MSG_WARN([AC_PACKAGE_NAME will be built without PCRE support.])
else
AC_MSG_ERROR([pcre support not available])
AC_MSG_ERROR([PCRE support not available])
fi
fi
if test $use_pcre = yes; then
AC_DEFINE([HAVE_LIBPCRE], [1],
[Define to 1 if you have the Perl Compatible Regular Expressions
library (-lpcre).])
library.])
else
PCRE_CFLAGS=
PCRE_LIBS=

View File

@ -1,6 +1,6 @@
# List of files which containing translatable strings.
#
# Copyright 1997-1998, 2005-2018 Free Software Foundation, Inc.
# Copyright 1997-1998, 2005-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -16,6 +16,7 @@
# along with this program. If not, see <https://www.gnu.org/licenses/>.
lib/argmatch.c
lib/argmatch.h
lib/c-stack.c
lib/closeout.c
lib/dfa.c
@ -28,6 +29,6 @@ lib/quotearg.c
lib/regcomp.c
lib/version-etc.c
lib/xalloc-die.c
lib/xstrtol-error.c
src/dfasearch.c
src/grep.c
src/pcresearch.c

View File

@ -1,5 +1,5 @@
## Process this file with automake to create Makefile.in
# Copyright 1997-1998, 2005-2018 Free Software Foundation, Inc.
# Copyright 1997-1998, 2005-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -28,11 +28,12 @@ grep_SOURCES = \
die.h \
grep.c \
kwsearch.c \
kwset.c \
pcresearch.c \
searchutils.c
if USE_PCRE
grep_SOURCES += pcresearch.c
endif
noinst_HEADERS = grep.h kwset.h search.h system.h
noinst_HEADERS = grep.h search.h system.h
# Sometimes, the expansion of $(LIBINTL) includes -lc which may
# include modules defining variables like 'optind', so libgreputils.a
@ -40,7 +41,9 @@ noinst_HEADERS = grep.h kwset.h search.h system.h
# But libgreputils.a must also follow $(LIBINTL), since libintl uses
# replacement functions defined in libgreputils.a.
LDADD = \
../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a $(LIBICONV) \
../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a \
$(HARD_LOCALE_LIB) $(LIBC32CONV) \
$(LIBSIGSEGV) $(LIBUNISTRING) $(MBRTOWC_LIB) $(SETLOCALE_NULL_LIB) \
$(LIBTHREAD)
grep_LDADD = $(LDADD) $(PCRE_LIBS) $(LIBCSTACK)
@ -52,11 +55,11 @@ EXTRA_DIST = egrep.sh
egrep fgrep: egrep.sh Makefile
$(AM_V_GEN)grep=`echo grep | sed -e '$(transform)'` && \
case $@ in egrep) option=-E;; fgrep) option=-F;; esac && \
shell_does_substrings='set x/y && d=$${1%/*} && test "$$d" = x' && \
shell_does_substrings='set x/y && d=$${1##*/} && test "$$d" = y' && \
if $(SHELL) -c "$$shell_does_substrings" 2>/dev/null; then \
edit_substring='s,X,X,'; \
else \
edit_substring='s,\$${0%/\*},`expr "X$$0" : '\''X\\(.*\\)/'\''`,g'; \
edit_substring='s,\$${0##\*/},`expr "X$$0" : '\''X\\(.*\\)/'\''`,g'; \
fi && \
sed -e 's|[@]SHELL@|$(SHELL)|g' \
-e "$$edit_substring" \

View File

@ -1,5 +1,5 @@
/* dfasearch.c - searching subroutines using dfa and regex for grep.
Copyright 1992, 1998, 2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 1992, 1998, 2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,21 +12,19 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Written August 1992 by Mike Haertel. */
#include <config.h>
#include "intprops.h"
#include "search.h"
#include <search.h>
#include "die.h"
#include <error.h>
struct dfa_comp
{
/* KWset compiled pattern. For Ecompile and Gcompile, we compile
/* KWset compiled pattern. For GEAcompile, we compile
a list of strings, at least one of which is known to occur in
any string matching the regexp. */
kwset_t kwset;
@ -35,14 +33,14 @@ struct dfa_comp
struct dfa *dfa;
/* Regex compiled regexps. */
struct re_pattern_buffer* patterns;
size_t pcount;
struct re_pattern_buffer *patterns;
idx_t pcount;
struct re_registers regs;
/* Number of compiled fixed strings known to exactly match the regexp.
If kwsexec returns < kwset_exact_matches, then we don't need to
call the regexp matcher at all. */
ptrdiff_t kwset_exact_matches;
idx_t kwset_exact_matches;
bool begline;
};
@ -53,14 +51,10 @@ dfaerror (char const *mesg)
die (EXIT_TROUBLE, 0, "%s", mesg);
}
/* For now, the sole dfawarn-eliciting condition (use of a regexp
like '[:lower:]') is unequivocally an error, so treat it as such,
when possible. */
void
dfawarn (char const *mesg)
{
if (!getenv ("POSIXLY_CORRECT"))
dfaerror (mesg);
error (0, 0, _("warning: %s"), mesg);
}
/* If the DFA turns out to have some set of fixed strings one of
@ -80,9 +74,9 @@ kwsmusts (struct dfa_comp *dc)
The kwset matcher will return the index of the matching
string that it chooses. */
++dc->kwset_exact_matches;
ptrdiff_t old_len = strlen (dm->must);
ptrdiff_t new_len = old_len + dm->begline + dm->endline;
char *must = xmalloc (new_len);
idx_t old_len = strlen (dm->must);
idx_t new_len = old_len + dm->begline + dm->endline;
char *must = ximalloc (new_len);
char *mp = must;
*mp = eolbyte;
mp += dm->begline;
@ -103,8 +97,103 @@ kwsmusts (struct dfa_comp *dc)
dfamustfree (dm);
}
/* Return true if KEYS, of length LEN, might contain a back-reference.
Return false if KEYS cannot contain a back-reference.
BS_SAFE is true of encodings where a backslash cannot appear as the
last byte of a multibyte character. */
static bool _GL_ATTRIBUTE_PURE
possible_backrefs_in_pattern (char const *keys, idx_t len, bool bs_safe)
{
/* Normally a backslash, but in an unsafe encoding this is a non-char
value so that the comparison below always fails, because if there
are two adjacent '\' bytes, the first might be the last byte of a
multibyte character. */
int second_backslash = bs_safe ? '\\' : CHAR_MAX + 1;
/* This code can return true even if KEYS lacks a back-reference, for
patterns like [\2], or for encodings where '\' appears as the last
byte of a multibyte character. However, false alarms should be
rare and do not affect correctness. */
/* Do not look for a backslash in the pattern's last byte, since it
can't be part of a back-reference and this streamlines the code. */
len--;
if (0 <= len)
{
char const *lim = keys + len;
for (char const *p = keys; (p = memchr (p, '\\', lim - p)); p++)
{
if ('1' <= p[1] && p[1] <= '9')
return true;
if (p[1] == second_backslash)
{
p++;
if (p == lim)
break;
}
}
}
return false;
}
static bool
regex_compile (struct dfa_comp *dc, char const *p, idx_t len,
idx_t pcount, idx_t lineno, reg_syntax_t syntax_bits,
bool syntax_only)
{
struct re_pattern_buffer pat;
pat.buffer = nullptr;
pat.allocated = 0;
/* Do not use a fastmap with -i, to work around glibc Bug#20381. */
static_assert (UCHAR_MAX < IDX_MAX);
idx_t uchar_max = UCHAR_MAX;
pat.fastmap = syntax_only | match_icase ? nullptr : ximalloc (uchar_max + 1);
pat.translate = nullptr;
if (syntax_only)
re_set_syntax (syntax_bits | RE_NO_SUB);
else
re_set_syntax (syntax_bits);
char const *err = re_compile_pattern (p, len, &pat);
if (!err)
{
if (syntax_only)
regfree (&pat);
else
dc->patterns[pcount] = pat;
return true;
}
free (pat.fastmap);
/* Emit a filename:lineno: prefix for patterns taken from files. */
idx_t pat_lineno;
char const *pat_filename
= lineno < 0 ? "" : pattern_file_name (lineno, &pat_lineno);
if (*pat_filename == '\0')
error (0, 0, "%s", err);
else
{
ptrdiff_t n = pat_lineno;
error (0, 0, "%s:%td: %s", pat_filename, n, err);
}
return false;
}
/* Compile PATTERN, containing SIZE bytes that are followed by '\n'.
SYNTAX_BITS specifies whether PATTERN uses style -G, -E, or -A.
Return a description of the compiled pattern. */
void *
GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits)
GEAcompile (char *pattern, idx_t size, reg_syntax_t syntax_bits,
bool exact)
{
char *motif;
struct dfa_comp *dc = xcalloc (1, sizeof (*dc));
@ -113,9 +202,12 @@ GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits)
if (match_icase)
syntax_bits |= RE_ICASE;
re_set_syntax (syntax_bits);
int dfaopts = eolbyte ? 0 : DFA_EOL_NUL;
int dfaopts = (DFA_CONFUSING_BRACKETS_ERROR | DFA_STRAY_BACKSLASH_WARN
| DFA_PLUS_WARN
| (syntax_bits & RE_CONTEXT_INDEP_OPS ? DFA_STAR_WARN : 0)
| (eolbyte ? 0 : DFA_EOL_NUL));
dfasyntax (dc->dfa, &localeinfo, syntax_bits, dfaopts);
bool bs_safe = !localeinfo.multibyte | localeinfo.using_utf8;
/* For GNU regex, pass the patterns separately to detect errors like
"[\nallo\n]\n", where the patterns are "[", "allo" and "]", and
@ -124,53 +216,82 @@ GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits)
char const *p = pattern;
char const *patlim = pattern + size;
bool compilation_failed = false;
size_t palloc = 0;
dc->patterns = xmalloc (sizeof *dc->patterns);
dc->patterns++;
dc->pcount = 0;
idx_t palloc = 1;
char const *prev = pattern;
/* Buffer containing back-reference-free patterns. */
char *buf = nullptr;
idx_t buflen = 0;
idx_t bufalloc = 0;
idx_t lineno = 0;
do
{
size_t len;
char const *sep = memchr (p, '\n', patlim - p);
if (sep)
char const *sep = rawmemchr (p, '\n');
idx_t len = sep - p;
bool backref = possible_backrefs_in_pattern (p, len, bs_safe);
if (backref && prev < p)
{
len = sep - p;
sep++;
idx_t prevlen = p - prev;
ptrdiff_t bufshortage = buflen - bufalloc + prevlen;
if (0 < bufshortage)
buf = xpalloc (buf, &bufalloc, bufshortage, -1, 1);
memcpy (buf + buflen, prev, prevlen);
buflen += prevlen;
}
else
len = patlim - p;
if (palloc <= dc->pcount)
dc->patterns = x2nrealloc (dc->patterns, &palloc, sizeof *dc->patterns);
struct re_pattern_buffer *pat = &dc->patterns[dc->pcount];
pat->buffer = NULL;
pat->allocated = 0;
/* Do not use a fastmap with -i, to work around glibc Bug#20381. */
pat->fastmap = match_icase ? NULL : xmalloc (UCHAR_MAX + 1);
pat->translate = NULL;
char const *err = re_compile_pattern (p, len, pat);
if (err)
/* Ensure room for at least two more patterns. The extra one is
for the regex_compile that may be executed after this loop
exits, and its (unused) slot is patterns[-1] until then. */
ptrdiff_t shortage = dc->pcount - palloc + 2;
if (0 < shortage)
{
/* With patterns specified only on the command line, emit the bare
diagnostic. Otherwise, include a filename:lineno: prefix. */
size_t lineno;
char const *pat_filename = pattern_file_name (dc->pcount + 1,
&lineno);
if (*pat_filename == '\0')
error (0, 0, "%s", err);
else
error (0, 0, "%s:%zu: %s", pat_filename, lineno, err);
compilation_failed = true;
dc->patterns = xpalloc (dc->patterns - 1, &palloc, shortage, -1,
sizeof *dc->patterns);
dc->patterns++;
}
if (!regex_compile (dc, p, len, dc->pcount, lineno, syntax_bits,
!backref))
compilation_failed = true;
p = sep + 1;
lineno++;
if (backref)
{
dc->pcount++;
prev = p;
}
dc->pcount++;
p = sep;
}
while (p);
while (p <= patlim);
if (compilation_failed)
exit (EXIT_TROUBLE);
if (patlim < prev)
buflen--;
else if (pattern < prev)
{
idx_t prevlen = patlim - prev;
buf = xirealloc (buf, buflen + prevlen);
memcpy (buf + buflen, prev, prevlen);
buflen += prevlen;
}
else
{
buf = pattern;
buflen = size;
}
/* In the match_words and match_lines cases, we use a different pattern
for the DFA matcher that will quickly throw out cases that won't work.
Then if DFA succeeds we do some hairy stuff using the regex matcher
@ -186,11 +307,12 @@ GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits)
static char const word_beg_bk[] = "\\(^\\|[^[:alnum:]_]\\)\\(";
static char const word_end_bk[] = "\\)\\([^[:alnum:]_]\\|$\\)";
int bk = !(syntax_bits & RE_NO_BK_PARENS);
char *n = xmalloc (sizeof word_beg_bk - 1 + size + sizeof word_end_bk);
idx_t bracket_bytes = sizeof word_beg_bk - 1 + sizeof word_end_bk;
char *n = ximalloc (size + bracket_bytes);
strcpy (n, match_lines ? (bk ? line_beg_bk : line_beg_no_bk)
: (bk ? word_beg_bk : word_beg_no_bk));
size_t total = strlen (n);
idx_t total = strlen (n);
memcpy (n + total, pattern, size);
total += size;
strcpy (n + total, match_lines ? (bk ? line_end_bk : line_end_no_bk)
@ -200,26 +322,42 @@ GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits)
size = total;
}
else
motif = NULL;
motif = nullptr;
dfacomp (pattern, size, dc->dfa, 1);
dfaparse (pattern, size, dc->dfa);
kwsmusts (dc);
dfacomp (nullptr, 0, dc->dfa, 1);
if (buf)
{
if (exact || !dfasupported (dc->dfa))
{
dc->patterns--;
dc->pcount++;
if (!regex_compile (dc, buf, buflen, 0, -1, syntax_bits, false))
abort ();
}
if (buf != pattern)
free (buf);
}
free (motif);
return dc;
}
size_t
EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
ptrdiff_t
EGexecute (void *vdc, char const *buf, idx_t size, idx_t *match_size,
char const *start_ptr)
{
char const *buflim, *beg, *end, *ptr, *match, *best_match, *mb_start;
char eol = eolbyte;
regoff_t start;
size_t len, best_len;
idx_t len, best_len;
struct kwsmatch kwsm;
size_t i;
idx_t i;
struct dfa_comp *dc = vdc;
struct dfa *superset = dfasuperset (dc->dfa);
bool dfafast = dfaisfast (dc->dfa);
@ -234,7 +372,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
if (!start_ptr)
{
char const *next_beg, *dfa_beg = beg;
size_t count = 0;
idx_t count = 0;
bool exact_kwset_match = false;
bool backref = false;
@ -248,7 +386,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
buflim - beg + dc->begline,
&kwsm, true);
if (offset < 0)
goto failure;
return offset;
match = beg + offset;
prev_beg = beg;
@ -264,14 +402,19 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
greater of the latter two values; this temporarily prefers
the DFA to KWset. */
exact_kwset_match = kwsm.index < dc->kwset_exact_matches;
end = ((exact_kwset_match || !dfafast
|| MAX (16, match - beg) < (match - prev_beg) >> 2)
? match
: MAX (16, match - beg) < (buflim - prev_beg) >> 2
? prev_beg + 4 * MAX (16, match - beg)
: buflim);
end = memchr (end, eol, buflim - end);
end = end ? end + 1 : buflim;
if (exact_kwset_match || !dfafast
|| MAX (16, match - beg) < (match - prev_beg) >> 2)
{
end = rawmemchr (match, eol);
end++;
}
else if (MAX (16, match - beg) < (buflim - prev_beg) >> 2)
{
end = rawmemchr (prev_beg + 4 * MAX (16, match - beg), eol);
end++;
}
else
end = buflim;
if (exact_kwset_match)
{
@ -279,7 +422,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
goto success;
if (mb_start < beg)
mb_start = beg;
if (mb_goback (&mb_start, match, buflim) == 0)
if (mb_goback (&mb_start, nullptr, match, buflim) == 0)
goto success;
/* The matched line starts in the middle of a multibyte
character. Perform the DFA search starting from the
@ -295,8 +438,8 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
potential matches; this is more likely to be fast
than falling back to KWset would be. */
next_beg = dfaexec (superset, dfa_beg, (char *) end, 0,
&count, NULL);
if (next_beg == NULL || next_beg == end)
&count, nullptr);
if (!next_beg || next_beg == end)
continue;
/* Narrow down to the line we've found. */
@ -306,8 +449,8 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
beg++;
dfa_beg = beg;
}
end = memchr (next_beg, eol, buflim - next_beg);
end = end ? end + 1 : buflim;
end = rawmemchr (next_beg, eol);
end++;
count = 0;
}
@ -318,7 +461,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
/* If there's no match, or if we've matched the sentinel,
we're done. */
if (next_beg == NULL || next_beg == end)
if (!next_beg || next_beg == end)
continue;
/* Narrow down to the line we've found. */
@ -327,10 +470,10 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
beg = memrchr (buf, eol, next_beg - buf);
beg++;
}
end = memchr (next_beg, eol, buflim - next_beg);
end = end ? end + 1 : buflim;
end = rawmemchr (next_beg, eol);
end++;
/* Successful, no backreferences encountered! */
/* Successful, no back-references encountered! */
if (!backref)
goto success;
ptr = beg;
@ -446,13 +589,11 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t *match_size,
}
} /* for (beg = end ..) */
failure:
return -1;
success:
len = end - beg;
success_in_len:;
size_t off = beg - buf;
*match_size = len;
return off;
return beg - buf;
}

View File

@ -1,5 +1,5 @@
/* Report an error and exit.
Copyright 2016-2018 Free Software Foundation, Inc.
Copyright 2016-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,15 +12,12 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#ifndef DIE_H
#define DIE_H
#include <error.h>
#include <stdbool.h>
#include <verify.h>
/* Like 'error (STATUS, ...)', except STATUS must be a nonzero constant.

View File

@ -1,2 +1,4 @@
#!@SHELL@
cmd=${0##*/}
echo "$cmd: warning: $cmd is obsolescent; using @grep@ @option@" >&2
exec @grep@ @option@ "$@"

1160
src/grep.c

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,5 @@
/* grep.h - interface to grep driver for searching subroutines.
Copyright (C) 1992, 1998, 2001, 2007, 2009-2018 Free Software Foundation,
Copyright (C) 1992, 1998, 2001, 2007, 2009-2026 Free Software Foundation,
Inc.
This program is free software; you can redistribute it and/or modify
@ -13,14 +13,12 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#ifndef GREP_GREP_H
#define GREP_GREP_H 1
#include <stdbool.h>
#include <idx.h>
/* The following flags are exported from grep for the matchers
to look at. */
@ -29,6 +27,6 @@ extern bool match_words; /* -w */
extern bool match_lines; /* -x */
extern char eolbyte; /* -z */
extern char const *pattern_file_name (size_t, size_t *);
extern char const *pattern_file_name (idx_t, idx_t *);
#endif

View File

@ -1,5 +1,5 @@
/* kwsearch.c - searching subroutines using kwset for grep.
Copyright 1992, 1998, 2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 1992, 1998, 2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,14 +12,12 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Written August 1992 by Mike Haertel. */
#include <config.h>
#include "search.h"
#include <search.h>
/* A compiled -F pattern list. */
@ -32,58 +30,46 @@ struct kwsearch
'kwswords (kwset)' when some extra one-character words have been
appended, one for each troublesome character that will require a
DFA search. */
ptrdiff_t words;
idx_t words;
/* The user's pattern and its size in bytes. */
char *pattern;
size_t size;
idx_t size;
/* The user's pattern compiled as a regular expression,
or null if it has not been compiled. */
void *re;
};
/* Compile the -F style PATTERN, containing SIZE bytes. Return a
description of the compiled pattern. */
/* Compile the -F style PATTERN, containing SIZE bytes that are
followed by '\n'. Return a description of the compiled pattern. */
void *
Fcompile (char *pattern, size_t size, reg_syntax_t ignored)
Fcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
{
kwset_t kwset;
ptrdiff_t total = size;
char *buf = NULL;
size_t bufalloc = 0;
char *buf = nullptr;
idx_t bufalloc = 0;
kwset = kwsinit (true);
char const *p = pattern;
do
{
ptrdiff_t len;
char const *sep = memchr (p, '\n', total);
if (sep)
{
len = sep - p;
sep++;
total -= (len + 1);
}
else
{
len = total;
total = 0;
}
char const *sep = rawmemchr (p, '\n');
idx_t len = sep - p;
if (match_lines)
{
if (eolbyte == '\n' && pattern < p && sep)
if (eolbyte == '\n' && pattern < p)
p--;
else
{
if (bufalloc < len + 2)
{
free (buf);
bufalloc = len + 2;
buf = x2realloc (NULL, &bufalloc);
bufalloc = len;
buf = xpalloc (nullptr, &bufalloc, 2, -1, 1);
buf[0] = eolbyte;
}
memcpy (buf + 1, p, len);
@ -94,45 +80,13 @@ Fcompile (char *pattern, size_t size, reg_syntax_t ignored)
}
kwsincr (kwset, p, len);
p = sep;
p = sep + 1;
}
while (p);
while (p <= pattern + size);
free (buf);
ptrdiff_t words = kwswords (kwset);
if (match_icase)
{
/* For each pattern character C that has a case folded
counterpart F that is multibyte and so cannot easily be
implemented via translating a single byte, append a pattern
containing just F. That way, if the data contains F, the
matcher can fall back on DFA. For example, if C is 'i' and
the locale is en_US.utf8, append a pattern containing just
the character U+0131 (LATIN SMALL LETTER DOTLESS I), so that
Fexecute will use a DFA if the data contain U+0131. */
mbstate_t mbs = { 0 };
char checked[NCHAR] = {0,};
for (p = pattern; p < pattern + size; p++)
{
unsigned char c = *p;
if (checked[c])
continue;
checked[c] = true;
wint_t wc = localeinfo.sbctowc[c];
wchar_t folded[CASE_FOLDED_BUFSIZE];
for (int i = case_folded_counterparts (wc, folded); 0 <= --i; )
{
char s[MB_LEN_MAX];
int nbytes = wcrtomb (s, folded[i], &mbs);
if (1 < nbytes)
kwsincr (kwset, s, nbytes);
}
}
}
idx_t words = kwswords (kwset);
kwsprep (kwset);
struct kwsearch *kwsearch = xmalloc (sizeof *kwsearch);
@ -140,61 +94,39 @@ Fcompile (char *pattern, size_t size, reg_syntax_t ignored)
kwsearch->words = words;
kwsearch->pattern = pattern;
kwsearch->size = size;
kwsearch->re = NULL;
kwsearch->re = nullptr;
return kwsearch;
}
/* Use the compiled pattern VCP to search the buffer BUF of size SIZE.
If found, return the offset of the first match and store its
size into *MATCH_SIZE. If not found, return SIZE_MAX.
size into *MATCH_SIZE. If not found, return -1.
If START_PTR is nonnull, start searching there. */
size_t
Fexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
ptrdiff_t
Fexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
char const *start_ptr)
{
char const *beg, *end, *mb_start;
ptrdiff_t len;
idx_t len;
char eol = eolbyte;
struct kwsmatch kwsmatch;
size_t ret_val;
bool mb_check;
bool longest;
struct kwsearch *kwsearch = vcp;
kwset_t kwset = kwsearch->kwset;
if (match_lines)
mb_check = longest = false;
else
{
mb_check = localeinfo.multibyte & !localeinfo.using_utf8;
longest = mb_check | !!start_ptr | match_words;
}
bool mb_check = localeinfo.multibyte & !localeinfo.using_utf8 & !match_lines;
bool longest = (mb_check | !!start_ptr | match_words) & !match_lines;
for (mb_start = beg = start_ptr ? start_ptr : buf; beg <= buf + size; beg++)
{
struct kwsmatch kwsmatch;
ptrdiff_t offset = kwsexec (kwset, beg - match_lines,
buf + size - beg + match_lines, &kwsmatch,
longest);
if (offset < 0)
break;
len = kwsmatch.size[0] - 2 * match_lines;
len = kwsmatch.size - 2 * match_lines;
if (kwsearch->words <= kwsmatch.index)
{
/* The data contain a multibyte character that matches
some pattern character that is a case folded counterpart.
Since the kwset code cannot handle this case, fall back
on the DFA code, which can. */
if (! kwsearch->re)
{
fgrep_to_grep_pattern (&kwsearch->pattern, &kwsearch->size);
kwsearch->re = GEAcompile (kwsearch->pattern, kwsearch->size,
RE_SYNTAX_GREP);
}
return EGexecute (kwsearch->re, buf, size, match_size, start_ptr);
}
if (mb_check && mb_goback (&mb_start, beg + offset, buf + size) != 0)
idx_t mbclen = 0;
if (mb_check
&& mb_goback (&mb_start, &mbclen, beg + offset, buf + size) != 0)
{
/* We have matched a single byte that is not at the beginning of a
multibyte character. mb_goback has advanced MB_START past that
@ -217,19 +149,27 @@ Fexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
goto success_in_beg_and_len;
if (match_lines)
{
len += start_ptr == NULL;
len += !start_ptr;
goto success_in_beg_and_len;
}
if (! match_words)
goto success;
/* Succeed if the preceding and following characters are word
constituents. If the following character is not a word
constituent, keep trying with shorter matches. */
char const *bol = memrchr (mb_start, eol, beg - mb_start);
if (bol)
mb_start = bol + 1;
if (! wordchar_prev (mb_start, beg, buf + size))
/* We need a preceding mb_start pointer. Use the beginning of line
if there is a preceding newline. */
if (mbclen == 0)
{
char const *nl = memrchr (mb_start, eol, beg - mb_start);
if (nl)
mb_start = nl + 1;
}
/* Succeed if neither the preceding nor the following character is a
word constituent. If the preceding is not, yet the following
character IS a word constituent, keep trying with shorter matches. */
if (mbclen > 0
? ! wordchar_next (beg - mbclen, buf + size)
: ! wordchar_prev (mb_start, beg, buf + size))
for (;;)
{
if (! wordchar_next (beg + len, buf + size))
@ -239,12 +179,36 @@ Fexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
else
goto success;
}
if (!start_ptr && !localeinfo.multibyte)
{
if (! kwsearch->re)
{
fgrep_to_grep_pattern (&kwsearch->pattern, &kwsearch->size);
kwsearch->re = GEAcompile (kwsearch->pattern,
kwsearch->size,
RE_SYNTAX_GREP, !!start_ptr);
}
if (beg + len < buf + size)
{
end = rawmemchr (beg + len, eol);
end++;
}
else
end = buf + size;
if (0 <= EGexecute (kwsearch->re, beg, end - beg,
match_size, nullptr))
goto success_match_words;
beg = end - 1;
break;
}
if (!len)
break;
offset = kwsexec (kwset, beg, --len, &kwsmatch, true);
if (offset != 0)
struct kwsmatch shorter_match;
if (kwsexec (kwset, beg, --len, &shorter_match, true) != 0)
break;
len = kwsmatch.size[0];
len = shorter_match.size;
}
/* No word match was found at BEG. Skip past word constituents,
@ -252,20 +216,23 @@ Fexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
them could make things much slower. */
beg += wordchars_size (beg, buf + size);
mb_start = beg;
} /* for (beg in buf) */
}
return -1;
success:
end = memchr (beg + len, eol, (buf + size) - (beg + len));
end = end ? end + 1 : buf + size;
if (beg + len < buf + size)
{
end = rawmemchr (beg + len, eol);
end++;
}
else
end = buf + size;
success_match_words:
beg = memrchr (buf, eol, beg - buf);
beg = beg ? beg + 1 : buf;
len = end - beg;
success_in_beg_and_len:;
size_t off = beg - buf;
*match_size = len;
ret_val = off;
return ret_val;
return beg - buf;
}

View File

@ -1,933 +0,0 @@
/* kwset.c - search for any of a set of keywords.
Copyright (C) 1989, 1998, 2000, 2005, 2007, 2009-2018 Free Software
Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
/* Written August 1989 by Mike Haertel. */
/* For the Aho-Corasick algorithm, see:
Aho AV, Corasick MJ. Efficient string matching: an aid to
bibliographic search. CACM 18, 6 (1975), 333-40
<https://dx.doi.org/10.1145/360825.360855>, which describes the
failure function used below.
For the Boyer-Moore algorithm, see: Boyer RS, Moore JS.
A fast string searching algorithm. CACM 20, 10 (1977), 762-72
<https://dx.doi.org/10.1145/359842.359859>.
For a survey of more-recent string matching algorithms that might
help improve performance, see: Faro S, Lecroq T. The exact online
string matching problem: a review of the most recent results.
ACM Computing Surveys 45, 2 (2013), 13
<https://dx.doi.org/10.1145/2431211.2431212>. */
#include <config.h>
#include "kwset.h"
#include <stdint.h>
#include <sys/types.h>
#include "system.h"
#include "intprops.h"
#include "memchr2.h"
#include "obstack.h"
#include "xalloc.h"
#include "verify.h"
#define obstack_chunk_alloc xmalloc
#define obstack_chunk_free free
static unsigned char
U (char ch)
{
return to_uchar (ch);
}
/* Balanced tree of edges and labels leaving a given trie node. */
struct tree
{
struct tree *llink; /* Left link; MUST be first field. */
struct tree *rlink; /* Right link (to larger labels). */
struct trie *trie; /* Trie node pointed to by this edge. */
unsigned char label; /* Label on this edge. */
char balance; /* Difference in depths of subtrees. */
};
/* Node of a trie representing a set of keywords. */
struct trie
{
/* If an accepting node, this is either 2*W + 1 where W is the word
index, or is SIZE_MAX if Aho-Corasick is in use and FAIL
specifies where to look for more info. If not an accepting node,
this is zero. */
size_t accepting;
struct tree *links; /* Tree of edges leaving this node. */
struct trie *parent; /* Parent of this node. */
struct trie *next; /* List of all trie nodes in level order. */
struct trie *fail; /* Aho-Corasick failure function. */
ptrdiff_t depth; /* Depth of this node from the root. */
ptrdiff_t shift; /* Shift function for search failures. */
ptrdiff_t maxshift; /* Max shift of self and descendants. */
};
/* Structure returned opaquely to the caller, containing everything. */
struct kwset
{
struct obstack obstack; /* Obstack for node allocation. */
ptrdiff_t words; /* Number of words in the trie. */
struct trie *trie; /* The trie itself. */
ptrdiff_t mind; /* Minimum depth of an accepting node. */
ptrdiff_t maxd; /* Maximum depth of any node. */
unsigned char delta[NCHAR]; /* Delta table for rapid search. */
struct trie *next[NCHAR]; /* Table of children of the root. */
char *target; /* Target string if there's only one. */
ptrdiff_t *shift; /* Used in Boyer-Moore search for one
string. */
char const *trans; /* Character translation table. */
/* This helps to match a terminal byte, which is the first byte
for Aho-Corasick, and the last byte for Boyer-More. If all the
patterns have the same terminal byte (after translation via TRANS
if TRANS is nonnull), then this is that byte as an unsigned char.
Otherwise this is -1 if there is disagreement among the strings
about terminal bytes, and -2 if there are no terminal bytes and
no disagreement because all the patterns are empty. */
int gc1;
/* This helps to match a terminal byte. If 0 <= GC1HELP, B is
terminal when B == GC1 || B == GC1HELP (note that GC1 == GCHELP
is common here). This is typically faster than evaluating
to_uchar (TRANS[B]) == GC1. */
int gc1help;
/* If the string has two or more bytes, this is the penultimate byte,
after translation via TRANS if TRANS is nonnull. This variable
is used only by Boyer-Moore. */
char gc2;
/* kwsexec implementation. */
ptrdiff_t (*kwsexec) (kwset_t, char const *, ptrdiff_t,
struct kwsmatch *, bool);
};
/* Use TRANS to transliterate C. A null TRANS does no transliteration. */
static inline char
tr (char const *trans, char c)
{
return trans ? trans[U(c)] : c;
}
static ptrdiff_t acexec (kwset_t, char const *, ptrdiff_t,
struct kwsmatch *, bool);
static ptrdiff_t bmexec (kwset_t, char const *, ptrdiff_t,
struct kwsmatch *, bool);
/* Return a newly allocated keyword set. A nonnull TRANS specifies a
table of character translations to be applied to all pattern and
search text. */
kwset_t
kwsalloc (char const *trans)
{
struct kwset *kwset = xmalloc (sizeof *kwset);
obstack_init (&kwset->obstack);
kwset->words = 0;
kwset->trie = obstack_alloc (&kwset->obstack, sizeof *kwset->trie);
kwset->trie->accepting = 0;
kwset->trie->links = NULL;
kwset->trie->parent = NULL;
kwset->trie->next = NULL;
kwset->trie->fail = NULL;
kwset->trie->depth = 0;
kwset->trie->shift = 0;
kwset->mind = PTRDIFF_MAX;
kwset->maxd = -1;
kwset->target = NULL;
kwset->trans = trans;
kwset->kwsexec = acexec;
return kwset;
}
/* This upper bound is valid for CHAR_BIT >= 4 and
exact for CHAR_BIT in { 4..11, 13, 15, 17, 19 }. */
enum { DEPTH_SIZE = CHAR_BIT + CHAR_BIT / 2 };
/* Add the given string to the contents of the keyword set. */
void
kwsincr (kwset_t kwset, char const *text, ptrdiff_t len)
{
assume (0 <= len);
struct trie *trie = kwset->trie;
char const *trans = kwset->trans;
bool reverse = kwset->kwsexec == bmexec;
if (reverse)
text += len;
/* Descend the trie (built of keywords) character-by-character,
installing new nodes when necessary. */
while (len--)
{
unsigned char uc = reverse ? *--text : *text++;
unsigned char label = trans ? trans[uc] : uc;
/* Descend the tree of outgoing links for this trie node,
looking for the current character and keeping track
of the path followed. */
struct tree *cur = trie->links;
struct tree *links[DEPTH_SIZE];
enum { L, R } dirs[DEPTH_SIZE];
links[0] = (struct tree *) &trie->links;
dirs[0] = L;
ptrdiff_t depth = 1;
while (cur && label != cur->label)
{
links[depth] = cur;
if (label < cur->label)
dirs[depth++] = L, cur = cur->llink;
else
dirs[depth++] = R, cur = cur->rlink;
}
/* The current character doesn't have an outgoing link at
this trie node, so build a new trie node and install
a link in the current trie node's tree. */
if (!cur)
{
cur = obstack_alloc (&kwset->obstack, sizeof *cur);
cur->llink = NULL;
cur->rlink = NULL;
cur->trie = obstack_alloc (&kwset->obstack, sizeof *cur->trie);
cur->trie->accepting = 0;
cur->trie->links = NULL;
cur->trie->parent = trie;
cur->trie->next = NULL;
cur->trie->fail = NULL;
cur->trie->depth = trie->depth + 1;
cur->trie->shift = 0;
cur->label = label;
cur->balance = 0;
/* Install the new tree node in its parent. */
if (dirs[--depth] == L)
links[depth]->llink = cur;
else
links[depth]->rlink = cur;
/* Back up the tree fixing the balance flags. */
while (depth && !links[depth]->balance)
{
if (dirs[depth] == L)
--links[depth]->balance;
else
++links[depth]->balance;
--depth;
}
/* Rebalance the tree by pointer rotations if necessary. */
if (depth && ((dirs[depth] == L && --links[depth]->balance)
|| (dirs[depth] == R && ++links[depth]->balance)))
{
struct tree *t, *r, *l, *rl, *lr;
switch (links[depth]->balance)
{
case (char) -2:
switch (dirs[depth + 1])
{
case L:
r = links[depth], t = r->llink, rl = t->rlink;
t->rlink = r, r->llink = rl;
t->balance = r->balance = 0;
break;
case R:
r = links[depth], l = r->llink, t = l->rlink;
rl = t->rlink, lr = t->llink;
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
l->balance = t->balance != 1 ? 0 : -1;
r->balance = t->balance != (char) -1 ? 0 : 1;
t->balance = 0;
break;
default:
abort ();
}
break;
case 2:
switch (dirs[depth + 1])
{
case R:
l = links[depth], t = l->rlink, lr = t->llink;
t->llink = l, l->rlink = lr;
t->balance = l->balance = 0;
break;
case L:
l = links[depth], r = l->rlink, t = r->llink;
lr = t->llink, rl = t->rlink;
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
l->balance = t->balance != 1 ? 0 : -1;
r->balance = t->balance != (char) -1 ? 0 : 1;
t->balance = 0;
break;
default:
abort ();
}
break;
default:
abort ();
}
if (dirs[depth - 1] == L)
links[depth - 1]->llink = t;
else
links[depth - 1]->rlink = t;
}
}
trie = cur->trie;
}
/* Mark the node finally reached as accepting, encoding the
index number of this word in the keyword set so far. */
if (!trie->accepting)
{
size_t words = kwset->words;
trie->accepting = 2 * words + 1;
}
++kwset->words;
/* Keep track of the longest and shortest string of the keyword set. */
if (trie->depth < kwset->mind)
kwset->mind = trie->depth;
if (trie->depth > kwset->maxd)
kwset->maxd = trie->depth;
}
ptrdiff_t
kwswords (kwset_t kwset)
{
return kwset->words;
}
/* Enqueue the trie nodes referenced from the given tree in the
given queue. */
static void
enqueue (struct tree *tree, struct trie **last)
{
if (!tree)
return;
enqueue (tree->llink, last);
enqueue (tree->rlink, last);
(*last) = (*last)->next = tree->trie;
}
/* Compute the Aho-Corasick failure function for the trie nodes referenced
from the given tree, given the failure function for their parent as
well as a last resort failure node. */
static void
treefails (struct tree const *tree, struct trie const *fail,
struct trie *recourse, bool reverse)
{
struct tree *cur;
if (!tree)
return;
treefails (tree->llink, fail, recourse, reverse);
treefails (tree->rlink, fail, recourse, reverse);
/* Find, in the chain of fails going back to the root, the first
node that has a descendant on the current label. */
while (fail)
{
cur = fail->links;
while (cur && tree->label != cur->label)
if (tree->label < cur->label)
cur = cur->llink;
else
cur = cur->rlink;
if (cur)
{
tree->trie->fail = cur->trie;
if (!reverse && cur->trie->accepting && !tree->trie->accepting)
tree->trie->accepting = SIZE_MAX;
return;
}
fail = fail->fail;
}
tree->trie->fail = recourse;
}
/* Set delta entries for the links of the given tree such that
the preexisting delta value is larger than the current depth. */
static void
treedelta (struct tree const *tree, ptrdiff_t depth, unsigned char delta[])
{
if (!tree)
return;
treedelta (tree->llink, depth, delta);
treedelta (tree->rlink, depth, delta);
if (depth < delta[tree->label])
delta[tree->label] = depth;
}
/* Return true if A has every label in B. */
static bool _GL_ATTRIBUTE_PURE
hasevery (struct tree const *a, struct tree const *b)
{
if (!b)
return true;
if (!hasevery (a, b->llink))
return false;
if (!hasevery (a, b->rlink))
return false;
while (a && b->label != a->label)
if (b->label < a->label)
a = a->llink;
else
a = a->rlink;
return !!a;
}
/* Compute a vector, indexed by character code, of the trie nodes
referenced from the given tree. */
static void
treenext (struct tree const *tree, struct trie *next[])
{
if (!tree)
return;
treenext (tree->llink, next);
treenext (tree->rlink, next);
next[tree->label] = tree->trie;
}
/* Prepare a built keyword set for use. */
void
kwsprep (kwset_t kwset)
{
char const *trans = kwset->trans;
ptrdiff_t i;
unsigned char deltabuf[NCHAR];
unsigned char *delta = trans ? deltabuf : kwset->delta;
struct trie *curr, *last;
/* Use Boyer-Moore if just one pattern, Aho-Corasick otherwise. */
bool reverse = kwset->words == 1;
if (reverse)
{
kwset_t new_kwset;
/* Enqueue the immediate descendants in the level order queue. */
for (curr = last = kwset->trie; curr; curr = curr->next)
enqueue (curr->links, &last);
/* Looking for just one string. Extract it from the trie. */
kwset->target = obstack_alloc (&kwset->obstack, kwset->mind);
for (i = 0, curr = kwset->trie; i < kwset->mind; ++i)
{
kwset->target[i] = curr->links->label;
curr = curr->next;
}
new_kwset = kwsalloc (kwset->trans);
new_kwset->kwsexec = bmexec;
kwsincr (new_kwset, kwset->target, kwset->mind);
obstack_free (&kwset->obstack, NULL);
*kwset = *new_kwset;
free (new_kwset);
}
/* Initial values for the delta table; will be changed later. The
delta entry for a given character is the smallest depth of any
node at which an outgoing edge is labeled by that character. */
memset (delta, MIN (kwset->mind, UCHAR_MAX), sizeof deltabuf);
/* Traverse the nodes of the trie in level order, simultaneously
computing the delta table, failure function, and shift function. */
for (curr = last = kwset->trie; curr; curr = curr->next)
{
/* Enqueue the immediate descendants in the level order queue. */
enqueue (curr->links, &last);
/* Update the delta table for the descendants of this node. */
treedelta (curr->links, curr->depth, delta);
/* Compute the failure function for the descendants of this node. */
treefails (curr->links, curr->fail, kwset->trie, reverse);
if (reverse)
{
curr->shift = kwset->mind;
curr->maxshift = kwset->mind;
/* Update the shifts at each node in the current node's chain
of fails back to the root. */
struct trie *fail;
for (fail = curr->fail; fail; fail = fail->fail)
{
/* If the current node has some outgoing edge that the fail
doesn't, then the shift at the fail should be no larger
than the difference of their depths. */
if (!hasevery (fail->links, curr->links))
if (curr->depth - fail->depth < fail->shift)
fail->shift = curr->depth - fail->depth;
/* If the current node is accepting then the shift at the
fail and its descendants should be no larger than the
difference of their depths. */
if (curr->accepting && fail->maxshift > curr->depth - fail->depth)
fail->maxshift = curr->depth - fail->depth;
}
}
}
if (reverse)
{
/* Traverse the trie in level order again, fixing up all nodes whose
shift exceeds their inherited maxshift. */
for (curr = kwset->trie->next; curr; curr = curr->next)
{
if (curr->maxshift > curr->parent->maxshift)
curr->maxshift = curr->parent->maxshift;
if (curr->shift > curr->maxshift)
curr->shift = curr->maxshift;
}
}
/* Create a vector, indexed by character code, of the outgoing links
from the root node. Accumulate GC1 and GC1HELP. */
struct trie *nextbuf[NCHAR];
struct trie **next = trans ? nextbuf : kwset->next;
memset (next, 0, sizeof nextbuf);
treenext (kwset->trie->links, next);
int gc1 = -2;
int gc1help = -1;
for (i = 0; i < NCHAR; i++)
{
int ti = i;
if (trans)
{
ti = U(trans[i]);
kwset->next[i] = next[ti];
}
if (kwset->next[i])
{
if (gc1 < -1)
{
gc1 = ti;
gc1help = i;
}
else if (gc1 == ti)
gc1help = gc1help == ti ? i : -1;
else if (i == ti && gc1 == gc1help)
gc1help = i;
else
gc1 = -1;
}
}
kwset->gc1 = gc1;
kwset->gc1help = gc1help;
if (reverse)
{
/* Looking for just one string. Extract it from the trie. */
kwset->target = obstack_alloc (&kwset->obstack, kwset->mind);
for (i = kwset->mind - 1, curr = kwset->trie; i >= 0; --i)
{
kwset->target[i] = curr->links->label;
curr = curr->next;
}
if (kwset->mind > 1)
{
/* Looking for the delta2 shift that might be made after a
backwards match has failed. Extract it from the trie. */
kwset->shift
= obstack_alloc (&kwset->obstack,
sizeof *kwset->shift * (kwset->mind - 1));
for (i = 0, curr = kwset->trie->next; i < kwset->mind - 1; ++i)
{
kwset->shift[i] = curr->shift;
curr = curr->next;
}
/* The penultimate byte. */
kwset->gc2 = tr (trans, kwset->target[kwset->mind - 2]);
}
}
/* Fix things up for any translation table. */
if (trans)
for (i = 0; i < NCHAR; ++i)
kwset->delta[i] = delta[U(trans[i])];
}
/* Delta2 portion of a Boyer-Moore search. *TP is the string text
pointer; it is updated in place. EP is the end of the string text,
and SP the end of the pattern. LEN is the pattern length; it must
be at least 2. TRANS, if nonnull, is the input translation table.
GC1 and GC2 are the last and second-from last bytes of the pattern,
transliterated by TRANS; the caller precomputes them for
efficiency. If D1 is nonnull, it is a delta1 table for shifting *TP
when failing. KWSET->shift says how much to shift. */
static inline bool
bm_delta2_search (char const **tpp, char const *ep, char const *sp,
ptrdiff_t len,
char const *trans, char gc1, char gc2,
unsigned char const *d1, kwset_t kwset)
{
char const *tp = *tpp;
ptrdiff_t d = len, skip = 0;
while (true)
{
ptrdiff_t i = 2;
if (tr (trans, tp[-2]) == gc2)
{
while (++i <= d)
if (tr (trans, tp[-i]) != tr (trans, sp[-i]))
break;
if (i > d)
{
for (i = d + skip + 1; i <= len; ++i)
if (tr (trans, tp[-i]) != tr (trans, sp[-i]))
break;
if (i > len)
{
*tpp = tp - len;
return true;
}
}
}
tp += d = kwset->shift[i - 2];
if (tp > ep)
break;
if (tr (trans, tp[-1]) != gc1)
{
if (d1)
tp += d1[U(tp[-1])];
break;
}
skip = i - 1;
}
*tpp = tp;
return false;
}
/* Return the address of the first byte in the buffer S (of size N)
that matches the terminal byte specified by KWSET, or NULL if there
is no match. KWSET->gc1 should be nonnegative. */
static char const *
memchr_kwset (char const *s, ptrdiff_t n, kwset_t kwset)
{
char const *slim = s + n;
if (kwset->gc1help < 0)
{
for (; s < slim; s++)
if (kwset->next[U(*s)])
return s;
}
else
{
int small_heuristic = 2;
size_t small_bytes = small_heuristic * sizeof (unsigned long int);
while (s < slim)
{
if (kwset->next[U(*s)])
return s;
s++;
if ((uintptr_t) s % small_bytes == 0)
return memchr2 (s, kwset->gc1, kwset->gc1help, slim - s);
}
}
return NULL;
}
/* Fast Boyer-Moore search (inlinable version). */
static inline ptrdiff_t
bmexec_trans (kwset_t kwset, char const *text, ptrdiff_t size)
{
assume (0 <= size);
unsigned char const *d1;
char const *ep, *sp, *tp;
int d;
ptrdiff_t len = kwset->mind;
char const *trans = kwset->trans;
if (len == 0)
return 0;
if (len > size)
return -1;
if (len == 1)
{
tp = memchr_kwset (text, size, kwset);
return tp ? tp - text : -1;
}
d1 = kwset->delta;
sp = kwset->target + len;
tp = text + len;
char gc1 = kwset->gc1;
char gc2 = kwset->gc2;
/* Significance of 12: 1 (initial offset) + 10 (skip loop) + 1 (md2). */
ptrdiff_t len12;
if (!INT_MULTIPLY_WRAPV (len, 12, &len12) && len12 < size)
/* 11 is not a bug, the initial offset happens only once. */
for (ep = text + size - 11 * len; tp <= ep; )
{
char const *tp0 = tp;
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
if (d != 0)
{
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
if (d != 0)
{
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
if (d != 0)
{
d = d1[U(tp[-1])], tp += d;
d = d1[U(tp[-1])], tp += d;
/* As a heuristic, prefer memchr to seeking by
delta1 when the latter doesn't advance much. */
int advance_heuristic = 16 * sizeof (long);
if (advance_heuristic <= tp - tp0)
continue;
tp--;
tp = memchr_kwset (tp, text + size - tp, kwset);
if (! tp)
return -1;
tp++;
if (ep <= tp)
break;
}
}
}
if (bm_delta2_search (&tp, ep, sp, len, trans, gc1, gc2, d1, kwset))
return tp - text;
}
/* Now only a few characters are left to search. Carefully avoid
ever producing an out-of-bounds pointer. */
ep = text + size;
d = d1[U(tp[-1])];
while (d <= ep - tp)
{
d = d1[U((tp += d)[-1])];
if (d != 0)
continue;
if (bm_delta2_search (&tp, ep, sp, len, trans, gc1, gc2, NULL, kwset))
return tp - text;
}
return -1;
}
/* Fast Boyer-Moore search. */
static ptrdiff_t
bmexec (kwset_t kwset, char const *text, ptrdiff_t size,
struct kwsmatch *kwsmatch, bool longest)
{
/* Help the compiler inline in two ways, depending on whether
kwset->trans is null. */
ptrdiff_t ret = (IGNORE_DUPLICATE_BRANCH_WARNING
(kwset->trans
? bmexec_trans (kwset, text, size)
: bmexec_trans (kwset, text, size)));
if (0 <= ret)
{
kwsmatch->index = 0;
kwsmatch->offset[0] = ret;
kwsmatch->size[0] = kwset->mind;
}
return ret;
}
/* Hairy multiple string search with the Aho-Corasick algorithm.
(inlinable version) */
static inline ptrdiff_t
acexec_trans (kwset_t kwset, char const *text, ptrdiff_t len,
struct kwsmatch *kwsmatch, bool longest)
{
struct trie const *trie, *accept;
char const *tp, *left, *lim;
struct tree const *tree;
char const *trans;
/* Initialize register copies and look for easy ways out. */
if (len < kwset->mind)
return -1;
trans = kwset->trans;
trie = kwset->trie;
lim = text + len;
tp = text;
if (!trie->accepting)
{
unsigned char c;
int gc1 = kwset->gc1;
while (true)
{
if (gc1 < 0)
{
while (! (trie = kwset->next[c = tr (trans, *tp++)]))
if (tp >= lim)
return -1;
}
else
{
tp = memchr_kwset (tp, lim - tp, kwset);
if (!tp)
return -1;
c = tr (trans, *tp++);
trie = kwset->next[c];
}
while (true)
{
if (trie->accepting)
goto match;
if (tp >= lim)
return -1;
c = tr (trans, *tp++);
for (tree = trie->links; c != tree->label; )
{
tree = c < tree->label ? tree->llink : tree->rlink;
if (! tree)
{
trie = trie->fail;
if (!trie)
{
trie = kwset->next[c];
if (trie)
goto have_trie;
if (tp >= lim)
return -1;
goto next_c;
}
if (trie->accepting)
{
--tp;
goto match;
}
tree = trie->links;
}
}
trie = tree->trie;
have_trie:;
}
next_c:;
}
}
match:
accept = trie;
while (accept->accepting == SIZE_MAX)
accept = accept->fail;
left = tp - accept->depth;
/* Try left-most longest match. */
if (longest)
{
while (tp < lim)
{
struct trie const *accept1;
char const *left1;
unsigned char c = tr (trans, *tp++);
do
{
tree = trie->links;
while (tree && c != tree->label)
tree = c < tree->label ? tree->llink : tree->rlink;
}
while (!tree && (trie = trie->fail) && accept->depth <= trie->depth);
if (!tree)
break;
trie = tree->trie;
if (trie->accepting)
{
accept1 = trie;
while (accept1->accepting == SIZE_MAX)
accept1 = accept1->fail;
left1 = tp - accept1->depth;
if (left1 <= left)
{
left = left1;
accept = accept1;
}
}
}
}
kwsmatch->index = accept->accepting / 2;
kwsmatch->offset[0] = left - text;
kwsmatch->size[0] = accept->depth;
return left - text;
}
/* Hairy multiple string search with Aho-Corasick algorithm. */
static ptrdiff_t
acexec (kwset_t kwset, char const *text, ptrdiff_t size,
struct kwsmatch *kwsmatch, bool longest)
{
assume (0 <= size);
/* Help the compiler inline in two ways, depending on whether
kwset->trans is null. */
return (IGNORE_DUPLICATE_BRANCH_WARNING
(kwset->trans
? acexec_trans (kwset, text, size, kwsmatch, longest)
: acexec_trans (kwset, text, size, kwsmatch, longest)));
}
/* Find the first instance of a KWSET member in TEXT, which has SIZE bytes.
Return the offset (into TEXT) of the first byte of the matching substring,
or -1 if no match is found. Upon a match, store details in
*KWSMATCH: index of matched keyword, start offset (same as the return
value), and length. If LONGEST, find the longest match; otherwise
any match will do. */
ptrdiff_t
kwsexec (kwset_t kwset, char const *text, ptrdiff_t size,
struct kwsmatch *kwsmatch, bool longest)
{
return kwset->kwsexec (kwset, text, size, kwsmatch, longest);
}
/* Free the components of the given keyword set. */
void
kwsfree (kwset_t kwset)
{
obstack_free (&kwset->obstack, NULL);
free (kwset);
}

View File

@ -1,44 +0,0 @@
/* kwset.h - header declaring the keyword set library.
Copyright (C) 1989, 1998, 2005, 2007, 2009-2018 Free Software Foundation,
Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
/* Written August 1989 by Mike Haertel. */
#include <stddef.h>
#include <stdbool.h>
struct kwsmatch
{
ptrdiff_t index; /* Index number of matching keyword. */
ptrdiff_t offset[1]; /* Offset of match. */
ptrdiff_t size[1]; /* Length of match. */
};
#include "arg-nonnull.h"
struct kwset;
typedef struct kwset *kwset_t;
extern kwset_t kwsalloc (char const *);
extern void kwsincr (kwset_t, char const *, ptrdiff_t);
extern ptrdiff_t kwswords (kwset_t) _GL_ATTRIBUTE_PURE;
extern void kwsprep (kwset_t);
extern ptrdiff_t kwsexec (kwset_t, char const *, ptrdiff_t,
struct kwsmatch *, bool)
_GL_ARG_NONNULL ((4));
extern void kwsfree (kwset_t);

View File

@ -1,5 +1,5 @@
/* pcresearch.c - searching subroutines using PCRE for grep.
Copyright 2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,235 +12,286 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
/* Written August 1992 by Mike Haertel. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#include <config.h>
#include "search.h"
#include <search.h>
#include "die.h"
#if HAVE_LIBPCRE
# include <pcre.h>
#include <stdckdint.h>
/* This must be at least 2; everything after that is for performance
in pcre_exec. */
enum { NSUB = 300 };
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
# ifndef PCRE_EXTRA_MATCH_LIMIT_RECURSION
# define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0
# endif
# ifndef PCRE_STUDY_JIT_COMPILE
# define PCRE_STUDY_JIT_COMPILE 0
# endif
# ifndef PCRE_STUDY_EXTRA_NEEDED
# define PCRE_STUDY_EXTRA_NEEDED 0
# endif
/* For older PCRE2. */
#ifndef PCRE2_SIZE_MAX
# define PCRE2_SIZE_MAX SIZE_MAX
#endif
#ifndef PCRE2_CONFIG_DEPTHLIMIT
# define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
# define PCRE2_ERROR_DEPTHLIMIT PCRE2_ERROR_RECURSIONLIMIT
# define pcre2_set_depth_limit pcre2_set_recursion_limit
#endif
#ifndef PCRE2_EXTRA_ASCII_BSD
# define PCRE2_EXTRA_ASCII_BSD 0
#endif
/* Use PCRE2_MATCH_INVALID_UTF if supported and not buggy;
see <https://github.com/PCRE2Project/pcre2/issues/224>.
Assume the bug will be fixed after PCRE2 10.42. */
#if defined PCRE2_MATCH_INVALID_UTF && 10 < PCRE2_MAJOR + (42 < PCRE2_MINOR)
enum { MATCH_INVALID_UTF = PCRE2_MATCH_INVALID_UTF };
#else
enum { MATCH_INVALID_UTF = 0 };
#endif
struct pcre_comp
{
/* General context for PCRE operations. */
pcre2_general_context *gcontext;
/* Compiled internal form of a Perl regular expression. */
pcre *cre;
pcre2_code *cre;
/* Additional information about the pattern. */
pcre_extra *extra;
/* Match context and data block. */
pcre2_match_context *mcontext;
pcre2_match_data *data;
# if PCRE_STUDY_JIT_COMPILE
/* The JIT stack and its maximum size. */
pcre_jit_stack *jit_stack;
int jit_stack_size;
# endif
pcre2_jit_stack *jit_stack;
idx_t jit_stack_size;
/* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
/* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty
string matches when that flag is used. */
int empty_match[2];
};
/* Memory allocation functions for PCRE. */
static void *
private_malloc (PCRE2_SIZE size, _GL_UNUSED void *unused)
{
if (IDX_MAX < size)
xalloc_die ();
return ximalloc (size);
}
static void
private_free (void *ptr, _GL_UNUSED void *unused)
{
free (ptr);
}
void
Pprint_version (void)
{
char *buf = ximalloc (pcre2_config (PCRE2_CONFIG_VERSION, nullptr));
pcre2_config (PCRE2_CONFIG_VERSION, buf);
printf (_("\ngrep -P uses PCRE2 %s\n"), buf);
free (buf);
}
/* Match the already-compiled PCRE pattern against the data in SUBJECT,
of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
options OPTIONS, and storing resulting matches into SUB. Return
the (nonnegative) match location or a (negative) error number. */
options OPTIONS.
Return the (nonnegative) match count or a (negative) error number. */
static int
jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
int search_offset, int options, int *sub)
jit_exec (struct pcre_comp *pc, char const *subject, idx_t search_bytes,
idx_t search_offset, int options)
{
while (true)
{
int e = pcre_exec (pc->cre, pc->extra, subject, search_bytes,
search_offset, options, sub, NSUB);
/* STACK_GROWTH_RATE is taken from PCRE's src/pcre2_jit_compile.c.
Going over the jitstack_max limit could trigger an int
overflow bug. */
int STACK_GROWTH_RATE = 8192;
idx_t jitstack_max = MIN (IDX_MAX, SIZE_MAX - (STACK_GROWTH_RATE - 1));
# if PCRE_STUDY_JIT_COMPILE
if (e == PCRE_ERROR_JIT_STACKLIMIT
&& 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
int e = pcre2_match (pc->cre, (PCRE2_SPTR) subject, search_bytes,
search_offset, options, pc->data, pc->mcontext);
if (e == PCRE2_ERROR_JIT_STACKLIMIT
&& pc->jit_stack_size <= jitstack_max / 2)
{
int old_size = pc->jit_stack_size;
int new_size = pc->jit_stack_size = old_size * 2;
if (pc->jit_stack)
pcre_jit_stack_free (pc->jit_stack);
pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size);
idx_t old_size = pc->jit_stack_size;
idx_t new_size = pc->jit_stack_size = old_size * 2;
pcre2_jit_stack_free (pc->jit_stack);
pc->jit_stack = pcre2_jit_stack_create (old_size, new_size,
pc->gcontext);
if (!pc->jit_stack)
die (EXIT_TROUBLE, 0,
_("failed to allocate memory for the PCRE JIT stack"));
pcre_assign_jit_stack (pc->extra, NULL, pc->jit_stack);
continue;
xalloc_die ();
if (!pc->mcontext)
pc->mcontext = pcre2_match_context_create (pc->gcontext);
pcre2_jit_stack_assign (pc->mcontext, nullptr, pc->jit_stack);
}
# endif
# if PCRE_EXTRA_MATCH_LIMIT_RECURSION
if (e == PCRE_ERROR_RECURSIONLIMIT
&& (PCRE_STUDY_EXTRA_NEEDED || pc->extra)
&& pc->extra->match_limit_recursion <= ULONG_MAX / 2)
else if (e == PCRE2_ERROR_DEPTHLIMIT)
{
pc->extra->match_limit_recursion *= 2;
if (pc->extra->match_limit_recursion == 0)
{
pc->extra->match_limit_recursion = (1 << 24) - 1;
pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION;
}
continue;
uint32_t lim;
pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
if (ckd_mul (&lim, lim, 2))
return e;
if (!pc->mcontext)
pc->mcontext = pcre2_match_context_create (pc->gcontext);
pcre2_set_depth_limit (pc->mcontext, lim);
}
# endif
return e;
else
return e;
}
}
#endif
/* Return true if E is an error code for bad UTF-8. */
static bool
bad_utf8_from_pcre2 (int e)
{
return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1;
}
/* Compile the -P style PATTERN, containing SIZE bytes that are
followed by '\n'. Return a description of the compiled pattern. */
void *
Pcompile (char *pattern, size_t size, reg_syntax_t ignored)
Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
{
#if !HAVE_LIBPCRE
die (EXIT_TROUBLE, 0,
_("support for the -P option is not compiled into "
"this --disable-perl-regexp binary"));
#else
int e;
char const *ep;
static char const wprefix[] = "(?<!\\w)(?:";
static char const wsuffix[] = ")(?!\\w)";
static char const xprefix[] = "^(?:";
static char const xsuffix[] = ")$";
int fix_len_max = MAX (sizeof wprefix - 1 + sizeof wsuffix - 1,
sizeof xprefix - 1 + sizeof xsuffix - 1);
char *re = xnmalloc (4, size + (fix_len_max + 4 - 1) / 4);
int flags = PCRE_DOLLAR_ENDONLY | (match_icase ? PCRE_CASELESS : 0);
char const *patlim = pattern + size;
char *n = re;
char const *p;
char const *pnul;
struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
PCRE2_SIZE e;
int ec;
int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
char *patlim = pattern + size;
struct pcre_comp *pc = ximalloc (sizeof *pc);
pcre2_general_context *gcontext = pc->gcontext
= pcre2_general_context_create (private_malloc, private_free, nullptr);
pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext);
if (localeinfo.multibyte)
{
uint32_t unicode;
if (pcre2_config (PCRE2_CONFIG_UNICODE, &unicode) < 0 || !unicode)
die (EXIT_TROUBLE, 0,
_("-P supports only unibyte locales on this platform"));
if (! localeinfo.using_utf8)
die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
flags |= PCRE_UTF8;
flags |= PCRE2_UTF;
/* If supported, consider invalid UTF-8 as a barrier not an error. */
flags |= MATCH_INVALID_UTF;
/* If PCRE2_EXTRA_ASCII_BSD is available, use PCRE2_UCP
so that \d does not have the undesirable effect of matching
non-ASCII digits. Otherwise (i.e., with PCRE2 10.42 and earlier),
escapes like \w have only their ASCII interpretations,
but that's better than the confusion that would ensue if \d
matched non-ASCII digits. */
flags |= PCRE2_EXTRA_ASCII_BSD ? PCRE2_UCP : 0;
#if 0
/* Do not match individual code units but only UTF-8. */
flags |= PCRE2_NEVER_BACKSLASH_C;
#endif
}
/* FIXME: Remove this restriction. */
if (memchr (pattern, '\n', size))
if (rawmemchr (pattern, '\n') != patlim)
die (EXIT_TROUBLE, 0, _("the -P option only supports a single pattern"));
*n = '\0';
if (match_words)
strcpy (n, wprefix);
if (match_lines)
strcpy (n, xprefix);
n += strlen (n);
#ifdef PCRE2_EXTRA_MATCH_LINE
uint32_t extra_options = (PCRE2_EXTRA_ASCII_BSD
| (match_lines ? PCRE2_EXTRA_MATCH_LINE : 0));
pcre2_set_compile_extra_options (ccontext, extra_options);
#endif
/* The PCRE interface doesn't allow NUL bytes in the pattern, so
replace each NUL byte in the pattern with the four characters
"\000", removing a preceding backslash if there are an odd
number of backslashes before the NUL. */
for (p = pattern; (pnul = memchr (p, '\0', patlim - p)); p = pnul + 1)
void *re_storage = nullptr;
if (match_lines)
{
memcpy (n, p, pnul - p);
n += pnul - p;
for (p = pnul; pattern < p && p[-1] == '\\'; p--)
continue;
n -= (pnul - p) & 1;
strcpy (n, "\\000");
n += 4;
#ifndef PCRE2_EXTRA_MATCH_LINE
static char const *const xprefix = "^(?:";
static char const *const xsuffix = ")$";
idx_t re_size = size + strlen (xprefix) + strlen (xsuffix);
char *re = re_storage = ximalloc (re_size);
char *rez = mempcpy (re, xprefix, strlen (xprefix));
rez = mempcpy (rez, pattern, size);
memcpy (rez, xsuffix, strlen (xsuffix));
pattern = re;
size = re_size;
#endif
}
else if (match_words)
{
/* PCRE2_EXTRA_MATCH_WORD is incompatible with grep -w;
do things the grep way. */
static char const *const wprefix = "(?<!\\w)(?:";
static char const *const wsuffix = ")(?!\\w)";
idx_t re_size = size + strlen (wprefix) + strlen (wsuffix);
char *re = re_storage = ximalloc (re_size);
char *rez = mempcpy (re, wprefix, strlen (wprefix));
rez = mempcpy (rez, pattern, size);
memcpy (rez, wsuffix, strlen (wsuffix));
pattern = re;
size = re_size;
}
memcpy (n, p, patlim - p);
n += patlim - p;
*n = '\0';
if (match_words)
strcpy (n, wsuffix);
if (match_lines)
strcpy (n, xsuffix);
if (!localeinfo.multibyte)
pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext));
pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ());
pc->cre = pcre2_compile ((PCRE2_SPTR) pattern, size, flags,
&ec, &e, ccontext);
if (!pc->cre)
die (EXIT_TROUBLE, 0, "%s", ep);
{
enum { ERRBUFSIZ = 256 }; /* Taken from pcre2grep.c ERRBUFSIZ. */
PCRE2_UCHAR8 ep[ERRBUFSIZ];
pcre2_get_error_message (ec, ep, sizeof ep);
die (EXIT_TROUBLE, 0, "%s", ep);
}
int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE;
pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep);
if (ep)
die (EXIT_TROUBLE, 0, "%s", ep);
free (re_storage);
pcre2_compile_context_free (ccontext);
# if PCRE_STUDY_JIT_COMPILE
if (pcre_fullinfo (pc->cre, pc->extra, PCRE_INFO_JIT, &e))
die (EXIT_TROUBLE, 0, _("internal error (should never happen)"));
pc->mcontext = nullptr;
pc->data = pcre2_match_data_create_from_pattern (pc->cre, gcontext);
/* Ignore any failure return from pcre2_jit_compile, as that merely
means JIT won't be used during matching. */
pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
/* The PCRE documentation says that a 32 KiB stack is the default. */
if (e)
pc->jit_stack_size = 32 << 10;
# endif
pc->jit_stack = nullptr;
pc->jit_stack_size = 32 << 10;
free (re);
int sub[NSUB];
pc->empty_match[false] = pcre_exec (pc->cre, pc->extra, "", 0, 0,
PCRE_NOTBOL, sub, NSUB);
pc->empty_match[true] = pcre_exec (pc->cre, pc->extra, "", 0, 0, 0, sub,
NSUB);
pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
return pc;
#endif /* HAVE_LIBPCRE */
}
size_t
Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
ptrdiff_t
Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
char const *start_ptr)
{
#if !HAVE_LIBPCRE
/* We can't get here, because Pcompile would have been called earlier. */
die (EXIT_TROUBLE, 0, _("internal error"));
#else
int sub[NSUB];
char const *p = start_ptr ? start_ptr : buf;
bool bol = p[-1] == eolbyte;
char const *line_start = buf;
int e = PCRE_ERROR_NOMATCH;
int e = PCRE2_ERROR_NOMATCH;
char const *line_end;
struct pcre_comp *pc = vcp;
PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data);
/* The search address to pass to pcre_exec. This is the start of
/* The search address to pass to PCRE. This is the start of
the buffer, or just past the most-recently discovered encoding
error or line end. */
char const *subject = buf;
do
{
/* Search line by line. Although this code formerly used
PCRE_MULTILINE for performance, the performance wasn't always
/* Search line by line. Although this formerly used something like
PCRE2_MULTILINE for performance, the performance wasn't always
better and the correctness issues were too puzzling. See
Bug#22655. */
line_end = memchr (p, eolbyte, buf + size - p);
if (INT_MAX < line_end - p)
line_end = rawmemchr (p, eolbyte);
if (PCRE2_SIZE_MAX < line_end - p)
die (EXIT_TROUBLE, 0, _("exceeded PCRE's line length limit"));
for (;;)
{
/* Skip past bytes that are easily determined to be encoding
errors, treating them as data that cannot match. This is
faster than having pcre_exec check them. */
faster than having PCRE check them. */
while (localeinfo.sbclen[to_uchar (*p)] == -1)
{
p++;
@ -248,10 +299,10 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
bol = false;
}
int search_offset = p - subject;
idx_t search_offset = p - subject;
/* Check for an empty match; this is faster than letting
pcre_exec do it. */
PCRE do it. */
if (p == line_end)
{
sub[0] = sub[1] = search_offset;
@ -261,13 +312,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
int options = 0;
if (!bol)
options |= PCRE_NOTBOL;
options |= PCRE2_NOTBOL;
e = jit_exec (pc, subject, line_end - subject, search_offset,
options, sub);
if (e != PCRE_ERROR_BADUTF8)
e = jit_exec (pc, subject, line_end - subject,
search_offset, options);
if (MATCH_INVALID_UTF || !bad_utf8_from_pcre2 (e))
break;
int valid_bytes = sub[0];
idx_t valid_bytes = pcre2_get_startchar (pc->data);
if (search_offset <= valid_bytes)
{
@ -277,14 +329,15 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
/* Handle the empty-match case specially, for speed.
This optimization is valid if VALID_BYTES is zero,
which means SEARCH_OFFSET is also zero. */
sub[0] = valid_bytes;
sub[1] = 0;
e = pc->empty_match[bol];
}
else
e = jit_exec (pc, subject, valid_bytes, search_offset,
options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
options | PCRE2_NO_UTF_CHECK | PCRE2_NOTEOL);
if (e != PCRE_ERROR_NOMATCH)
if (e != PCRE2_ERROR_NOMATCH)
break;
/* Treat the encoding error as data that cannot match. */
@ -295,7 +348,7 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
subject += valid_bytes + 1;
}
if (e != PCRE_ERROR_NOMATCH)
if (e != PCRE2_ERROR_NOMATCH)
break;
bol = true;
p = subject = line_start = line_end + 1;
@ -306,26 +359,42 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
{
switch (e)
{
case PCRE_ERROR_NOMATCH:
case PCRE2_ERROR_NOMATCH:
break;
case PCRE_ERROR_NOMEMORY:
die (EXIT_TROUBLE, 0, _("memory exhausted"));
case PCRE2_ERROR_NOMEMORY:
die (EXIT_TROUBLE, 0, _("%s: memory exhausted"), input_filename ());
# if PCRE_STUDY_JIT_COMPILE
case PCRE_ERROR_JIT_STACKLIMIT:
die (EXIT_TROUBLE, 0, _("exhausted PCRE JIT stack"));
# endif
case PCRE2_ERROR_JIT_STACKLIMIT:
die (EXIT_TROUBLE, 0, _("%s: exhausted PCRE JIT stack"),
input_filename ());
case PCRE_ERROR_MATCHLIMIT:
die (EXIT_TROUBLE, 0, _("exceeded PCRE's backtracking limit"));
case PCRE2_ERROR_MATCHLIMIT:
die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's backtracking limit"),
input_filename ());
case PCRE2_ERROR_DEPTHLIMIT:
die (EXIT_TROUBLE, 0,
_("%s: exceeded PCRE's nested backtracking limit"),
input_filename ());
case PCRE2_ERROR_RECURSELOOP:
die (EXIT_TROUBLE, 0, _("%s: PCRE detected recurse loop"),
input_filename ());
#ifdef PCRE2_ERROR_HEAPLIMIT
case PCRE2_ERROR_HEAPLIMIT:
die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"),
input_filename ());
#endif
default:
/* For now, we lump all remaining PCRE failures into this basket.
If anyone cares to provide sample grep usage that can trigger
particular PCRE errors, we can add to the list (above) of more
detailed diagnostics. */
die (EXIT_TROUBLE, 0, _("internal PCRE error: %d"), e);
die (EXIT_TROUBLE, 0, _("%s: internal PCRE error: %d"),
input_filename (), e);
}
return -1;
@ -349,5 +418,4 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
*match_size = end - beg;
return beg - buf;
}
#endif
}

View File

@ -1,5 +1,5 @@
/* search.c - searching subroutines using dfa, kwset and regex for grep.
Copyright 1992, 1998, 2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 1992, 1998, 2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,9 +12,7 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#ifndef GREP_SEARCH_H
#define GREP_SEARCH_H 1
@ -24,7 +22,6 @@
#include <sys/types.h>
#include <stdint.h>
#include <wchar.h>
#include <wctype.h>
#include <regex.h>
#include "system.h"
@ -48,39 +45,60 @@ typedef signed char mb_len_map_t;
/* searchutils.c */
extern void wordinit (void);
extern kwset_t kwsinit (bool);
extern size_t wordchars_size (char const *, char const *) _GL_ATTRIBUTE_PURE;
extern size_t wordchar_next (char const *, char const *) _GL_ATTRIBUTE_PURE;
extern size_t wordchar_prev (char const *, char const *, char const *)
extern idx_t wordchars_size (char const *, char const *) _GL_ATTRIBUTE_PURE;
extern idx_t wordchar_next (char const *, char const *) _GL_ATTRIBUTE_PURE;
extern idx_t wordchar_prev (char const *, char const *, char const *)
_GL_ATTRIBUTE_PURE;
extern ptrdiff_t mb_goback (char const **, char const *, char const *);
extern ptrdiff_t mb_goback (char const **, idx_t *, char const *, char const *);
/* dfasearch.c */
extern void *GEAcompile (char *, size_t, reg_syntax_t);
extern size_t EGexecute (void *, char const *, size_t, size_t *, char const *);
extern void *GEAcompile (char *, idx_t, reg_syntax_t, bool);
extern ptrdiff_t EGexecute (void *, char const *, idx_t, idx_t *, char const *);
/* kwsearch.c */
extern void *Fcompile (char *, size_t, reg_syntax_t);
extern size_t Fexecute (void *, char const *, size_t, size_t *, char const *);
extern void *Fcompile (char *, idx_t, reg_syntax_t, bool);
extern ptrdiff_t Fexecute (void *, char const *, idx_t, idx_t *, char const *);
/* pcresearch.c */
extern void *Pcompile (char *, size_t, reg_syntax_t);
extern size_t Pexecute (void *, char const *, size_t, size_t *, char const *);
extern void *Pcompile (char *, idx_t, reg_syntax_t, bool);
extern ptrdiff_t Pexecute (void *, char const *, idx_t, idx_t *, char const *);
extern void Pprint_version (void);
/* grep.c */
extern struct localeinfo localeinfo;
extern void fgrep_to_grep_pattern (char **, size_t *);
extern void fgrep_to_grep_pattern (char **, idx_t *);
/* Return the number of bytes in the character at the start of S, which
is of size N. N must be positive. MBS is the conversion state.
This acts like mbrlen, except it returns -1 and -2 instead of
(size_t) -1 and (size_t) -2. */
SEARCH_INLINE ptrdiff_t
imbrlen (char const *s, idx_t n, mbstate_t *mbs)
{
size_t len = mbrlen (s, n, mbs);
/* Convert result to ptrdiff_t portably, even on oddball platforms.
When optimizing, this typically uses no machine instructions. */
if (len <= MB_LEN_MAX)
return len;
ptrdiff_t neglen = -len;
return -neglen;
}
/* Return the number of bytes in the character at the start of S, which
is of size N. N must be positive. MBS is the conversion state.
This acts like mbrlen, except it returns 1 when mbrlen would return 0,
it returns -1 and -2 instead of (size_t) -1 and (size_t) -2,
and it is typically faster because of the cache. */
SEARCH_INLINE size_t
mb_clen (char const *s, size_t n, mbstate_t *mbs)
SEARCH_INLINE ptrdiff_t
mb_clen (char const *s, idx_t n, mbstate_t *mbs)
{
size_t len = localeinfo.sbclen[to_uchar (*s)];
return len == (size_t) -2 ? mbrlen (s, n, mbs) : len;
signed char len = localeinfo.sbclen[to_uchar (*s)];
return len == -2 ? imbrlen (s, n, mbs) : len;
}
extern char const *input_filename (void);
_GL_INLINE_HEADER_END
#endif /* GREP_SEARCH_H */

View File

@ -1,5 +1,5 @@
/* searchutils.c - helper subroutines for grep's matchers.
Copyright 1992, 1998, 2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 1992, 1998, 2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,15 +12,15 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#include <config.h>
#define SEARCH_INLINE _GL_EXTERN_INLINE
#define SYSTEM_INLINE _GL_EXTERN_INLINE
#include "search.h"
#include <search.h>
#include <uchar.h>
/* For each byte B, sbwordchar[B] is true if B is a single-byte
character that is a word constituent, and is false otherwise. */
@ -30,7 +30,7 @@ static bool sbwordchar[NCHAR];
static bool
wordchar (wint_t wc)
{
return wc == L'_' || iswalnum (wc);
return wc == L'_' || c32isalnum (wc);
}
void
@ -43,47 +43,53 @@ wordinit (void)
kwset_t
kwsinit (bool mb_trans)
{
char *trans = NULL;
char *trans = nullptr;
if (match_icase && (MB_CUR_MAX == 1 || mb_trans))
{
trans = xmalloc (NCHAR);
if (MB_CUR_MAX == 1)
for (int i = 0; i < NCHAR; i++)
trans[i] = toupper (i);
else
for (int i = 0; i < NCHAR; i++)
{
wint_t wc = localeinfo.sbctowc[i];
wint_t uwc = towupper (wc);
if (uwc != wc)
{
mbstate_t mbs = { 0 };
size_t len = wcrtomb (&trans[i], uwc, &mbs);
if (len != 1)
abort ();
}
else
trans[i] = i;
}
trans = ximalloc (NCHAR);
/* If I is a single-byte character that becomes a different
single-byte character when uppercased, set trans[I]
to that character. Otherwise, set trans[I] to I. */
for (int i = 0; i < NCHAR; i++)
trans[i] = toupper (i);
}
return kwsalloc (trans);
}
/* In the buffer *MB_START, return the number of bytes needed to go
back from CUR to the previous boundary, where a "boundary" is the
start of a multibyte character or is an error-encoding byte. The
buffer ends at END (i.e., one past the address of the buffer's last
byte). If CUR is already at a boundary, return 0. If *MB_START is
greater than CUR, return the negative value CUR - *MB_START.
/* Return the number of bytes needed to go back to the start of a
multibyte character in a buffer. The buffer starts at *MB_START.
(See below for MBCLEN's role.) The multibyte character contains
the byte addressed by CUR. The buffer ends just before END, which
must not be less than CUR.
When returning zero, set *MB_START to CUR. When returning a
positive value, set *MB_START to the next boundary after CUR, or to
END if there is no such boundary. When returning a negative value,
leave *MB_START alone. */
If CUR is no larger than *MB_START, return CUR - *MB_START without
modifying *MB_START or dealing with MBCLEN. Otherwise, update
*MB_START to point to the first multibyte character starting on or
after CUR, and if MBCLEN is nonnull then deal with MBCLEN as follows:
- If this function returns 0 and the locale is multibyte and is
not UTF-8, set *MBCLEN to the number of bytes in the multibyte
character containing the byte addressed by (CUR - 1).
- Otherwise, possibly set *MBCLEN to an unspecified value.
*MB_START should point to the start of a multibyte character, or to
an encoding-error byte.
*END should be a sentinel byte - one of '\0', '\r', '\n', '.', '/',
which POSIX says cannot be part of any other character. Also,
there should be a byte string immediately before *MB_START that
contains a sentinel byte. This means it is OK to scan backwards
before *MB_START as long as the scan stops at a sentinel byte, and
similarly it is OK to scan forwards from CUR (without checking END)
so long as the scan stops at a sentinel byte.
Treat encoding errors as if they were single-byte characters. */
ptrdiff_t
mb_goback (char const **mb_start, char const *cur, char const *end)
mb_goback (char const **mb_start, idx_t *mbclen, char const *cur,
char const *end)
{
const char *p = *mb_start;
const char *p0 = p;
@ -93,30 +99,44 @@ mb_goback (char const **mb_start, char const *cur, char const *end)
if (localeinfo.using_utf8)
{
/* UTF-8 permits scanning backward to the previous character.
Start by assuming CUR is at a character boundary. */
p = cur;
if (cur < end && (*cur & 0xc0) == 0x80)
if ((*cur & 0xc0) == 0x80)
for (int i = 1; i <= 3; i++)
if ((cur[-i] & 0xc0) != 0x80)
{
mbstate_t mbs = { 0 };
size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
if (i < clen && clen < (size_t) -2)
/* True if the length implied by the putative byte 1 at
CUR[-I] extends at least through *CUR. */
bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
if (long_enough)
{
p0 = cur - i;
p = p0 + clen;
mbstate_t mbs; mbszero (&mbs);
ptrdiff_t clen = imbrlen (cur - i, end - (cur - i), &mbs);
if (0 <= clen)
{
/* This multibyte character contains *CUR. */
p0 = cur - i;
p = p0 + clen;
}
}
break;
}
}
else
{
mbstate_t mbs = { 0 };
/* In non-UTF-8 encodings, to find character boundaries one must
in general scan forward from the start of the buffer. */
mbstate_t mbs; mbszero (&mbs);
ptrdiff_t clen;
do
{
size_t clen = mb_clen (p, end - p, &mbs);
clen = mb_clen (p, end - p, &mbs);
if ((size_t) -2 <= clen)
if (clen < 0)
{
/* An invalid sequence, or a truncated multibyte character.
Treat it as a single byte character. */
@ -127,6 +147,9 @@ mb_goback (char const **mb_start, char const *cur, char const *end)
p += clen;
}
while (p < cur);
if (mbclen)
*mbclen = clen;
}
*mb_start = p;
@ -136,36 +159,36 @@ mb_goback (char const **mb_start, char const *cur, char const *end)
/* Examine the start of BUF (which goes to END) for word constituents.
If COUNTALL, examine as many as possible; otherwise, examine at most one.
Return the total number of bytes in the examined characters. */
static size_t
static idx_t
wordchars_count (char const *buf, char const *end, bool countall)
{
size_t n = 0;
mbstate_t mbs = { 0 };
while (n < end - buf)
mbstate_t mbs; mbszero (&mbs);
char const *p = buf;
while (p < end)
{
unsigned char b = buf[n];
unsigned char b = *p;
if (sbwordchar[b])
n++;
p++;
else if (localeinfo.sbclen[b] != -2)
break;
else
{
wchar_t wc = 0;
size_t wcbytes = mbrtowc (&wc, buf + n, end - buf - n, &mbs);
char32_t wc = 0;
size_t wcbytes = mbrtoc32 (&wc, p, end - p, &mbs);
if (!wordchar (wc))
break;
n += wcbytes + !wcbytes;
p += wcbytes + !wcbytes;
}
if (!countall)
break;
}
return n;
return p - buf;
}
/* Examine the start of BUF for the longest prefix containing just
word constituents. Return the total number of bytes in the prefix.
The buffer ends at END. */
size_t
idx_t
wordchars_size (char const *buf, char const *end)
{
return wordchars_count (buf, end, true);
@ -173,7 +196,7 @@ wordchars_size (char const *buf, char const *end)
/* If BUF starts with a word constituent, return the number of bytes
used to represent it; otherwise, return zero. The buffer ends at END. */
size_t
idx_t
wordchar_next (char const *buf, char const *end)
{
return wordchars_count (buf, end, false);
@ -182,16 +205,15 @@ wordchar_next (char const *buf, char const *end)
/* In the buffer BUF, return nonzero if the character whose encoding
contains the byte before CUR is a word constituent. The buffer
ends at END. */
size_t
idx_t
wordchar_prev (char const *buf, char const *cur, char const *end)
{
if (buf == cur)
return 0;
unsigned char b = *--cur;
if (! localeinfo.multibyte
|| (localeinfo.using_utf8 && localeinfo.sbclen[b] != -2))
if (! localeinfo.multibyte || localeinfo.using_utf8 & ~(b >> 7))
return sbwordchar[b];
char const *p = buf;
cur -= mb_goback (&p, cur, end);
cur -= mb_goback (&p, nullptr, cur, end);
return wordchar_next (cur, end);
}

View File

@ -1,5 +1,5 @@
/* Portability cruft. Include after config.h and sys/types.h.
Copyright 1996, 1998-2000, 2007, 2009-2018 Free Software Foundation, Inc.
Copyright 1996, 1998-2000, 2007, 2009-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,9 +12,7 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#ifndef GREP_SYSTEM_H
#define GREP_SYSTEM_H 1
@ -101,9 +99,9 @@ void __asan_unpoison_memory_region (void const volatile *addr, size_t size);
#else
static _GL_UNUSED void
_GL_UNUSED static void
__asan_poison_memory_region (void const volatile *addr, size_t size) { }
static _GL_UNUSED void
_GL_UNUSED static void
__asan_unpoison_memory_region (void const volatile *addr, size_t size) { }
#endif

15
tests/100k-entries Executable file
View File

@ -0,0 +1,15 @@
#!/bin/sh
# This would make grep-3.11 fail with ENOTSUP and exit 2.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
expensive_
fail=0
mkdir t || framework_failure_
(cd t && seq 100000|xargs touch) || framework_failure_
returns_ 1 grep -r x t > out 2> err
compare /dev/null out || fail=1
compare /dev/null err || fail=1
Exit $fail

View File

@ -1,7 +1,7 @@
package Coreutils;
# This is a testing framework.
# Copyright (C) 1998-2015, 2017-2018 Free Software Foundation, Inc.
# Copyright (C) 1998-2015, 2017-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
package CuSkip;
# Skip a test: emit diag to log and to stderr, and exit 77
# Copyright (C) 2011-2015, 2017-2018 Free Software Foundation, Inc.
# Copyright (C) 2011-2015, 2017-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
package CuTmpdir;
# create, then chdir into a temporary sub-directory
# Copyright (C) 2007-2015, 2017-2018 Free Software Foundation, Inc.
# Copyright (C) 2007-2015, 2017-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,5 +1,5 @@
## Process this file with automake to create Makefile.in
# Copyright 1997-1998, 2005-2018 Free Software Foundation, Inc.
# Copyright 1997-1998, 2005-2026 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -41,18 +41,27 @@ AM_CFLAGS = $(WARN_CFLAGS) $(WERROR_CFLAGS)
# Tell the linker to omit references to unused shared libraries.
AM_LDFLAGS = $(IGNORE_UNUSED_LIBRARIES_CFLAGS)
LDADD = ../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a
LDADD = ../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a \
$(HARD_LOCALE_LIB) $(LIBC32CONV) $(LIBCSTACK) \
$(LIBSIGSEGV) $(LIBUNISTRING) $(MBRTOWC_LIB) $(SETLOCALE_NULL_LIB) \
$(LIBTHREAD)
# The triple-backref test is expected to fail with both the system
# matcher (i.e., with glibc) and with the included matcher.
# Both matchers need to be fixed.
# FIXME-2015: Remove this once the glibc and gnulib bugs are fixed.
# FIXME-2025: Remove this once the glibc and gnulib bugs are fixed.
XFAIL_TESTS = triple-backref
# The glibc-infloop test is expected to fail with both the system
# matcher (i.e., with glibc) and with the included matcher.
# Both matchers need to be fixed.
# FIXME-2025: Remove this once the glibc and gnulib bugs are fixed.
XFAIL_TESTS += glibc-infloop
# Equivalence classes are only supported when using the system
# matcher (which means only with glibc).
# The included matcher needs to be fixed.
# FIXME-2015: Remove this once the gnulib bug is fixed.
# FIXME-2025: Remove this once the gnulib bug is fixed.
if USE_INCLUDED_REGEX
XFAIL_TESTS += equiv-classes
else
@ -62,14 +71,17 @@ else
endif
TESTS = \
100k-entries \
backref \
backref-alt \
backref-multibyte-slow \
backref-word \
backslash-dot \
backslash-s-and-repetition-operators \
backslash-s-vs-invalid-multitype \
backslash-s-vs-invalid-multibyte \
big-hole \
big-match \
binary-file-matches \
bogus-wctob \
bre \
c-locale \
@ -81,11 +93,13 @@ TESTS = \
case-fold-titlecase \
char-class-multibyte \
char-class-multibyte2 \
color-colors \
context-0 \
count-newline \
dfa-coverage \
dfa-heap-overrun \
dfa-infloop \
dfa-invalid-utf8 \
dfaexec-multibyte \
empty \
empty-line \
@ -101,11 +115,15 @@ TESTS = \
fgrep-longest \
file \
filename-lineno.pl \
fillbuf-long-line \
fmbtest \
foad1 \
glibc-infloop \
grep-dev-null \
grep-dev-null-out \
grep-dir \
hangul-syllable \
hash-collision-perf \
help-version \
high-bit-range \
in-eq-out-infloop \
@ -117,18 +135,22 @@ TESTS = \
kwset-abuse \
long-line-vs-2GiB-read \
long-pattern-perf \
many-regex-performance \
match-lines \
max-count-overread \
max-count-vs-context \
mb-dot-newline \
mb-non-UTF8-overrun \
mb-non-UTF8-perf-Fw \
mb-non-UTF8-performance \
mb-non-UTF8-word-boundary \
multibyte-white-space \
multiple-begin-or-end-line \
null-byte \
options \
pcre \
pcre-abort \
pcre-ascii-digits \
pcre-context \
pcre-count \
pcre-infloop \
@ -137,6 +159,8 @@ TESTS = \
pcre-jitstack \
pcre-o \
pcre-utf8 \
pcre-utf8-bug224 \
pcre-utf8-w \
pcre-w \
pcre-wx-backref \
pcre-z \
@ -154,6 +178,7 @@ TESTS = \
stack-overflow \
status \
surrogate-pair \
surrogate-search \
symlink \
triple-backref \
turkish-I \
@ -165,11 +190,13 @@ TESTS = \
unibyte-bracket-expr \
unibyte-negated-circumflex \
utf8-bracket \
version-pcre \
warn-char-classes \
word-delim-multibyte \
word-multi-file \
word-multibyte \
write-error-msg \
y2038-vs-32-bit \
yesno \
z-anchor-newline
@ -231,15 +258,16 @@ TESTS_ENVIRONMENT = \
LOCALE_FR='$(LOCALE_FR)' \
LOCALE_FR_UTF8='$(LOCALE_FR_UTF8)' \
AWK=$(AWK) \
GREP_OPTIONS='' \
LC_ALL=C \
abs_top_builddir='$(abs_top_builddir)' \
abs_top_srcdir='$(abs_top_srcdir)' \
abs_srcdir='$(abs_srcdir)' \
built_programs="$$built_programs" \
host_triplet='$(host_triplet)' \
srcdir='$(srcdir)' \
top_srcdir='$(top_srcdir)' \
CC='$(CC)' \
CONFIG_HEADER='$(abs_top_builddir)/$(CONFIG_INCLUDE)' \
GREP_TEST_NAME=`echo $$tst|sed 's,^\./,,;s,/,-,g'` \
MAKE=$(MAKE) \
MALLOC_PERTURB_=$(MALLOC_PERTURB_) \
@ -248,7 +276,13 @@ TESTS_ENVIRONMENT = \
PERL='$(PERL)' \
SHELL='$(SHELL)' \
PATH='$(abs_top_builddir)/src$(PATH_SEPARATOR)'"$$PATH" \
; 9>&2
; \
\
: 'set this envvar to indicate whether -P works'; \
m=0; if err=`echo .|grep -Pq . 2>&1`; then \
test -z "$$err" && m=1; fi; \
export PCRE_WORKS=$$m; \
9>&2
LOG_COMPILER = $(SHELL)

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Test for backreferences and other things.
# Test for back-references and other things.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -43,4 +43,12 @@ if test $? -ne 2 ; then
failures=1
fi
# https://bugs.gnu.org/36148#13
echo 'Total failed: 2 (1 ignored)' |
grep -e '^Total failed: 0$' -e '^Total failed: \([0-9]*\) (\1 ignored)$'
if test $? -ne 1 ; then
echo "Backref: Multiple -e test, test #5 failed"
failures=1
fi
Exit $failures

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Test for a bug in glibc's regex code as of 2015-09-19.
#
# Copyright 2015-2018 Free Software Foundation, Inc.
# Copyright 2015-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

20
tests/backslash-dot Executable file
View File

@ -0,0 +1,20 @@
#! /bin/sh
# This once failed to match: echo . | grep '\.'
#
# Copyright (C) 2020-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
fail=0
echo . > in || framework_failure_
grep '\.' in > out 2> err || fail=1
compare in out || fail=1
compare /dev/null err || fail=1
Exit $fail

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Ensure that \s and \S work with repetition operators.
#
# Copyright (C) 2013-2018 Free Software Foundation, Inc.
# Copyright (C) 2013-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Ensure that neither \s nor \S matches an invalid multibyte character.
#
# Copyright (C) 2013-2018 Free Software Foundation, Inc.
# Copyright (C) 2013-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -11,11 +11,11 @@
require_en_utf8_locale_
printf '\202\n' > in || framework_failure_
LC_ALL=en_US.UTF-8
export LC_ALL
printf '\202\n' > in || framework_failure_
fail=0
grep '^\S$' in > out-S && fail=1
compare /dev/null out-S || fail=1

View File

@ -4,6 +4,7 @@
. "${srcdir=.}/init.sh"; path_prepend_ ../src
expensive_
require_perl_
# Skip this test if there is no usable SEEK_HOLE support,
# as is the case with linux-3.5.0 on ext4 and tmpfs file systems.

23
tests/binary-file-matches Executable file
View File

@ -0,0 +1,23 @@
#! /bin/sh
# Test for the "binary file ... matches" diagnostic.
#
# Copyright (C) 2020-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
fail=0
echo "grep: (standard input): binary file matches" > exp \
|| framework_failure_
for option in '' -s; do
printf 'a\0' | grep $option a > out 2> err || fail=1
compare /dev/null out || fail=1
compare exp err || fail=1
done
Exit $fail

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Regression test for GNU grep.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,4 +1,4 @@
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Regression test for GNU grep.
#
# Copyright 2016-2018 Free Software Foundation, Inc.
# Copyright 2016-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,7 +1,7 @@
#!/bin/sh
# Check that case folding works even with titlecase and similarly odd chars.
# Copyright 2014-2018 Free Software Foundation, Inc.
# Copyright 2014-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -168,7 +168,7 @@ do
done
# Try a unibyte test with ISO 8859-7, if available.
if test "$(get-mb-cur-max el_GR.iso88597)" -eq 1; then
if test "$(get-mb-cur-max el_GR.iso88597)" = 1; then
LC_ALL=el_GR.iso88597
export LC_ALL

48
tests/color-colors Executable file
View File

@ -0,0 +1,48 @@
#!/bin/sh
# Check that GREP_COLOR elicits a warning.
# Copyright 2022-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
fail=0
unset GREP_COLORS
unset GREP_COLOR
LC_ALL=C
export LC_ALL
printf 'x\n\n' >in || framework_failure_
printf '%s\n' \
"grep: warning: GREP_COLOR='36' is deprecated; use GREP_COLORS='mt=36'" \
>exp.err || framework_failure_
GREP_COLORS='mt=36:ln=35' grep --color=always . in >exp 2>err || fail=1
compare /dev/null err || fail=1
GREP_COLOR='36' GREP_COLORS='ln=35' grep --color=always . in >out 2>err \
|| fail=1
compare exp out || fail=1
compare exp.err err || fail=1
GREP_COLORS='mt=36' grep --color=always . in >exp 2>err || fail=1
compare /dev/null err || fail=1
GREP_COLOR='36' grep --color=always . in >out 2>err || fail=1
compare exp out || fail=1
compare exp.err err || fail=1
GREP_COLORS='ln=35' grep --color=always . in >out 2>err || fail=1
compare /dev/null err || fail=1
Exit $fail

View File

@ -2,7 +2,7 @@
# Test that newline is counted correctly even when the transition
# table is rebuilt.
# Copyright 2014-2018 Free Software Foundation, Inc.
# Copyright 2014-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
#!/bin/sh
# Exercise the final reachable code in dfa.c's match_mb_charset.
# Copyright (C) 2012-2018 Free Software Foundation, Inc.
# Copyright (C) 2012-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
#!/bin/sh
# Trigger a heap overrun in grep-2.6..grep-2.8.
# Copyright (C) 2011-2018 Free Software Foundation, Inc.
# Copyright (C) 2011-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

29
tests/dfa-invalid-utf8 Executable file
View File

@ -0,0 +1,29 @@
#! /bin/sh
# Test whether "grep '.'" matches invalid UTF-8 byte sequences.
#
# Copyright 2019-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
require_en_utf8_locale_
require_compiled_in_MB_support
fail=0
printf 'a\360\202\202\254b\n' >in1 || framework_failure_
LC_ALL=en_US.UTF-8 grep 'a.b' in1 > out1 2> err
test $? -eq 1 || fail=1
compare /dev/null out1 || fail=1
compare /dev/null err1 || fail=1
printf 'a\360\202\202\254ba\360\202\202\254b\n' >in2 ||
framework_failure_
LC_ALL=en_US.UTF-8 grep -E '(a.b)\1' in2 > out2 2> err
test $? -eq 1 || fail=1
compare /dev/null out2 || fail=1
compare /dev/null err2 || fail=1
Exit $fail

View File

@ -2,7 +2,7 @@
# test that the empty file means no pattern
# and an empty pattern means match all.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -39,17 +39,10 @@ for locale in C en_US.UTF-8; do
failures=1
fi
# should return 0 found a match
echo "" | LC_ALL=$locale timeout 10s grep $options -e ''
if test $? -ne 0 ; then
echo "Status: Wrong status code, test \#4 failed ($options $locale)"
failures=1
fi
# should return 0 found a match
echo abcd | LC_ALL=$locale timeout 10s grep $options -e ''
if test $? -ne 0 ; then
echo "Status: Wrong status code, test \#5 failed ($options $locale)"
echo "Status: Wrong status code, test \#4 failed ($options $locale)"
failures=1
fi
done

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Exercise bugs in grep-2.13 with -i, -n and an RE of ^$ in a multi-byte locale.
#
# Copyright (C) 2012-2018 Free Software Foundation, Inc.
# Copyright (C) 2012-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Test grep's behavior on encoding errors.
#
# Copyright 2015-2018 Free Software Foundation, Inc.
# Copyright 2015-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -11,22 +11,25 @@
require_en_utf8_locale_
LC_ALL=en_US.UTF-8
export LC_ALL
printf 'Alfred Jones\n' > a || framework_failure_
printf 'John Smith\n' >j || framework_failure_
printf 'Pedro P\351rez\n' >p || framework_failure_
cat a p j >in || framework_failure_
LC_ALL=en_US.UTF-8
export LC_ALL
fail=0
grep '^A' in >out || fail=1
compare a out || fail=1
grep '^P' in >out || fail=1
printf 'Binary file in matches\n' >exp || framework_failure_
compare exp out || fail=1
compare /dev/null out || fail=1
grep -I '^P' in >out 2>err || fail=1
compare /dev/null out || fail=1
compare /dev/null err || fail=1
grep '^J' in >out || fail=1
compare j out || fail=1
@ -35,9 +38,14 @@ returns_ 1 grep '^X' in >out || fail=1
compare /dev/null out || fail=1
grep . in >out || fail=1
(cat a j && printf 'Binary file in matches\n') >exp || framework_failure_
cat a j >exp || framework_failure_
compare exp out || fail=1
grep -I . in >out 2>err || fail=1
cat a j >exp || framework_failure_
compare exp out || fail=1
compare /dev/null err || fail=1
grep -a . in >out || fail=1
compare in out

View File

@ -1,7 +1,7 @@
# -*- sh -*-
# Check environment variables for sane values while testing.
# Copyright (C) 2000-2018 Free Software Foundation, Inc.
# Copyright (C) 2000-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Regression test for GNU grep.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -1,4 +1,4 @@
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -218,3 +218,5 @@
0@)@)
1@)@x
0@\()\((a\())(b))@()(a()b)
# This would erroneously match from grep-3.2 to grep-3.5
1@a+a+a@aa

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Test for false matches in grep 2.19..2.26 in multibyte, non-UTF8 locales
#
# Copyright (C) 2016-2018 Free Software Foundation, Inc.
# Copyright (C) 2016-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -18,7 +18,7 @@ ok () { printf "${G}OK${D}"; }
fail () { printf "${R}FAIL${D} (See ${U})"; failures=1; }
U=https://bugzilla.redhat.com/show_bug.cgi?id=116909
printf "fgrep false negatives: "
printf "grep -F false negatives: "
cat > 116909.list <<EOF
a
b
@ -59,7 +59,7 @@ if ( timeout --version ) > /dev/null 2>&1; then
echo foobar | returns_ 124 timeout 10 grep -Fw "" && fail || ok
U=https://bugzilla.redhat.com/show_bug.cgi?id=140781
printf 'fgrep hangs on binary files: '
printf 'grep -F hangs on binary files: '
returns_ 124 timeout 10 grep -F grep "$abs_top_builddir/src/grep" \
> /dev/null && fail || ok

View File

@ -2,7 +2,7 @@
# With multiple matches, grep -Fo could print a shorter one.
# This bug affected grep versions 2.26 through 2.27.
#
# Copyright (C) 2017-2018 Free Software Foundation, Inc.
# Copyright (C) 2017-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -4,7 +4,7 @@
# grep -F -f pattern_file file
# grep -G -f pattern_file file
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -4,7 +4,7 @@
# file or line number from which the offending regular expression came.
# With 2.26, now, each such diagnostic has a "FILENAME:LINENO: " prefix.
# Copyright (C) 2016-2018 Free Software Foundation, Inc.
# Copyright (C) 2016-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -37,6 +37,8 @@ $prog = $full_prog_name if $full_prog_name;
# Transform each to this: "Unmatched [..."
my $err_subst = {ERR_SUBST => 's/(: Unmatched \[).*/$1.../'};
my $no_pcre = "$prog: Perl matching not supported in a --disable-perl-regexp build\n";
my @Tests =
(
# Show that grep now includes filename:lineno in the diagnostic:
@ -48,7 +50,7 @@ my @Tests =
# Show that with two or more errors, grep now prints all diagnostics:
['invalid-re-2-files', '-f g -f h', {EXIT=>2},
{AUX=>{g=>"1\n2[[\n3\n4[[\n"}},
{AUX=>{h=>"\n\n[[\n"}},
{AUX=>{h=>"5\n6\n7[[\n"}},
$err_subst,
{ERR => "$prog: g:2: Unmatched [...\n"
. "$prog: g:4: Unmatched [...\n"
@ -59,7 +61,7 @@ my @Tests =
# Like the above, but on the other lines.
['invalid-re-2-files2', '-f g -f h', {EXIT=>2},
{AUX=>{g=>"1[[\n2\n3[[\n4\n"}},
{AUX=>{h=>"[[\n[[\n\n"}},
{AUX=>{h=>"5[[\n6[[\n7\n"}},
$err_subst,
{ERR => "$prog: g:1: Unmatched [...\n"
. "$prog: g:3: Unmatched [...\n"
@ -68,12 +70,57 @@ my @Tests =
},
],
# Make sure the line numbers are right when some regexps are duplicates.
['invalid-re-line-numbers', '-f g -f h', {EXIT=>2},
{AUX=>{g=>"1[[\n\n3[[\n\n5[[\n"}},
{AUX=>{h=>"1[[\n\n\n4[[\n\n6[[\n"}},
$err_subst,
{ERR => "$prog: g:1: Unmatched [...\n"
. "$prog: g:3: Unmatched [...\n"
. "$prog: g:5: Unmatched [...\n"
. "$prog: h:4: Unmatched [...\n"
. "$prog: h:6: Unmatched [...\n"
},
],
# Show that with two '-e'-specified erroneous regexps,
# there is no file name or line number.
['invalid-re-2e', '-e "[[" -e "[["', {EXIT=>2},
['invalid-re-2e', '-e "1[[" -e "2[["', {EXIT=>2},
$err_subst,
{ERR => "$prog: Unmatched [...\n" x 2},
],
# Test unmatched ) as well. It is OK with -E and an error with -G and -P.
['invalid-re-E-paren', '-E ")"', {IN=>''}, {EXIT=>1}],
['invalid-re-E-star-paren', '-E ".*)"', {IN=>''}, {EXIT=>1}],
['invalid-re-G-paren', '-G "\\)"', {EXIT=>2},
{ERR => "$prog: Unmatched ) or \\)\n"},
],
['invalid-re-G-star-paren', '-G "a.*\\)"', {EXIT=>2},
{ERR => "$prog: Unmatched ) or \\)\n"},
],
['invalid-re-P-paren', '-P ")"', {EXIT=>2},
{ERR => $ENV{PCRE_WORKS} == 1
? "$prog: unmatched closing parenthesis\n"
: $no_pcre
},
],
['invalid-re-P-star-paren', '-P "a.*)"', {EXIT=>2},
{ERR => $ENV{PCRE_WORKS} == 1
? "$prog: unmatched closing parenthesis\n"
: $no_pcre
},
],
# Prior to grep-3.6, the name of the offending file was not printed.
['backtracking-with-file', '-P "((a+)*)+$"', {EXIT=>2},
{IN=>{f=>"a"x20 ."b"}},
{ERR => $ENV{PCRE_WORKS} == 1
? "$prog: f: exceeded PCRE's backtracking limit\n"
: $no_pcre
},
],
);
my $save_temps = $ENV{DEBUG};

11
tests/fillbuf-long-line Executable file
View File

@ -0,0 +1,11 @@
#!/bin/sh
# This would fail for v3.7-15-ge3694e9 .. grep-v3.7-48-g5c3c427
. "${srcdir=.}/init.sh"; path_prepend_ ../src
printf %0104681d 0 > in || framework_failure_
fail=0
returns_ 1 grep xx in || fail=1
Exit $fail

View File

@ -1,5 +1,5 @@
#! /bin/sh
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -10,7 +10,7 @@
cz=cs_CZ.UTF-8
# If cs_CZ.UTF-8 locale doesn't work, skip this test.
LC_ALL=$cz locale -k LC_CTYPE 2>/dev/null | grep -q charmap.*UTF-8 \
test "`LC_ALL=$cz locale charmap 2>/dev/null`" = UTF-8 \
|| skip_ this system lacks the $cz locale
# If matching is done in single-byte mode, skip this test too
@ -53,21 +53,21 @@ EOF
for mode in F G E; do
test1=$(echo $(LC_ALL=$cz grep -${mode} -f cspatfile csinput |
tr -cs '0-9' '[ *]'))
tr '\n' ' ' | tr -cd '0-9 '))
if test "$test1" != "11 12 13 14 15 16 17 18"; then
echo "Test #1 ${mode} failed: $test1"
failures=1
fi
test2=$(echo $(LC_ALL=$cz grep -${mode}i -f cspatfile csinput |
tr -cs '0-9' '[ *]'))
tr '\n' ' ' | tr -cd '0-9 '))
if test "$test2" != "01 02 07 08 10 11 12 13 14 15 16 17 18 19 20"; then
echo "Test #2 ${mode} failed: $test2"
failures=1
fi
test3=$(echo $(LC_ALL=$cz grep -${mode}i -e 'ČÍšE' -e 'Čas' csinput |
tr -cs '0-9' '[ *]'))
tr '\n' ' ' | tr -cd '0-9 '))
if test "$test3" != "01 02 07 08 10 11 12 13 14 15 16 17 18 19 20"; then
echo "Test #3 ${mode} failed: $test3"
failures=1
@ -115,7 +115,7 @@ done
for mode in G E; do
test8=$(echo $(LC_ALL=$cz grep -${mode}i -e 'Č.šE' -e 'Č[a-f]s' csinput |
tr -cs '0-9' '[ *]'))
tr '\n' ' ' | tr -cd '0-9 '))
if test "$test8" != "01 02 07 08 10 11 12 13 14 15 16 17 18 19 20"; then
echo "Test #8 ${mode} failed: $test8"
failures=1

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Test various combinations of command-line options.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@ -150,7 +150,7 @@ Exit $failures
# The rest of this file is meant to be executed under this locale.
LC_ALL=cs_CZ.UTF-8; export LC_ALL
# If the UTF-8 locale doesn't work, skip these tests silently.
locale -k LC_CTYPE 2>/dev/null | grep -q "charmap.*UTF-8" || Exit $failures
test "`locale charmap 2>/dev/null`" = UTF-8 || Exit $failures
# Test character class erroneously matching a '[' character.
grep_test "[/" "" "[[:alpha:]]" -E

View File

@ -1,5 +1,5 @@
/* Auxiliary program to detect support for a locale.
Copyright 2010-2018 Free Software Foundation, Inc.
Copyright 2010-2026 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -12,17 +12,13 @@
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
02110-1301, USA. */
along with this program. If not, see <https://www.gnu.org/licenses/>. */
#include <config.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include "getprogname.h"
int
main (int argc, char **argv)
{

30
tests/glibc-infloop Executable file
View File

@ -0,0 +1,30 @@
#!/bin/sh
# This would infloop when using glibc's regex at least until glibc-2.36.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
require_timeout_
require_en_utf8_locale_
fail=0
cat <<\EOF > glibc-check.c
#include <features.h>
#ifdef __GLIBC__
int ok;
#else
# error "not glibc"
#endif
EOF
$CC -c glibc-check.c && glibc=1 || glibc=0
grep '^#define USE_INCLUDED_REGEX 1' "$CONFIG_HEADER" \
&& included_regex=1 || included_regex=0
case $glibc:$included_regex in
0:0) skip_ 'runs only with glibc or when built with the included regex'
esac
echo a > in || framework_failure_
timeout 2 env LC_ALL=en_US.UTF-8 grep -E -w '((()|a)|())*' in || fail=1
Exit $fail

View File

@ -6,7 +6,7 @@
require_timeout_
${AWK-awk} 'BEGIN {while (1) print "x"}' </dev/null |
returns_ 124 timeout 1 grep x >/dev/null || fail=1
returns_ 124 timeout 10 grep x >/dev/null || fail=1
echo abc | grep b >>/dev/null || fail=1

184
tests/hangul-syllable Executable file
View File

@ -0,0 +1,184 @@
#!/bin/sh
# grep 3.4 through 3.7 mishandled matching '.' against the valid UTF-8
# sequences (ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF,
# which are some Hangul Syllables and Hangul Jamo Extended-B. They
# also mishandled (F4)(88-8F)(80-BF)(80-BF) which correspond to
# U+108000 through U+10FFFF (Supplemental Private Use Area plane B).
. "${srcdir=.}/init.sh"; path_prepend_ ../src
require_en_utf8_locale_
LC_ALL=en_US.UTF-8
export LC_ALL
# Check that '.' completely matches $1, i.e., that $1 is a single UTF-8 char.
check_char ()
{
printf "$1\\n" >in || framework_failure_
grep $2 '^.$' in >out || fail=1
cmp in out || fail=1
}
# Check that '.*' does not completely match $1, i.e., that
# $1 contains an encoding error.
check_nonchar ()
{
printf "$1\\n" >in || framework_failure_
grep -a -v '^.*$' in >out || fail=1
cmp in out || fail=1
}
fail=0
# "." should match U+D45C HANGUL SYLLABLE PYO.
check_char '\355\221\234'
# Check boundary-condition characters, and non-characters,
# while we are at it.
check_char '\0' -a
check_char '\177'
check_nonchar '\200'
check_nonchar '\277'
check_nonchar '\300\200'
check_nonchar '\301\277'
for i in 302 337; do
for j in 200 277; do
check_char "\\$i\\$j"
done
for j in 177 300; do
check_nonchar "\\$i\\$j"
done
done
for i in 340; do
for j in 240 277; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
for k in 177 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
for j in 239 300; do
for k in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
done
for i in 341 354 356 357; do
for j in 200 277; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
for k in 177 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
for j in 177 300; do
for k in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
done
for i in 355; do
for j in 200 237; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
for k in 177 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
for j in 177 240; do
for k in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k"
done
done
done
# On platforms like 32-bit AIX where WCHAR_MAX == 0xFFFF, skip checks
# where the corresponding Unicode characters are not supported.
if test $fail -eq 0; then
printf '\360\220\200\200\n' >in || framework_failure_
grep '^.$' in >out 2>&1 || fail=1
cmp in out || skip_ 'platform does not support U+10000'
fi
for i in 360; do
for j in 220 277; do
for k in 200 277; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
for l in 177 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
for k in 177 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
for j in 217 300; do
for k in 177 200 277 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
done
for i in 361 363; do
for j in 200 277; do
for k in 200 277; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
for l in 177 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
for k in 177 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
for j in 177 300; do
for k in 177 200 277 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
done
for i in 364; do
for j in 200 217; do
for k in 200 277; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
for l in 177 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
for k in 177 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
for j in 177 220; do
for k in 177 200 277 300; do
for l in 177 200 277 300; do
check_nonchar "\\$i\\$j\\$k\\$l"
done
done
done
done
Exit $fail

57
tests/hash-collision-perf Executable file
View File

@ -0,0 +1,57 @@
#!/bin/sh
# Test for this performance regression:
# grep-3.5 and 3.6 would take O(N^2) time for some sets of input regexps.
# Copyright 2020-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
fail=0
require_perl_
: > empty || framework_failure_
# Construct a test case that consumes enough CPU time that we don't
# have to worry about measurement noise. This first case is searching
# for digits, which never exhibited a problem with hash collisions.
n_pat=40000
while :; do
seq $n_pat > in || framework_failure_
small_ms=$(LC_ALL=C user_time_ 1 grep --file=in empty) || fail=1
test $small_ms -ge 200 && break
n_pat=$(expr $n_pat '*' 2)
case $n_pat:$small_ms in
640000:0) skip_ 'user_time_ appears always to report 0 elapsed ms';;
esac
done
# Now, search for those same digits mapped to A-J.
# With the PJW-based hash function, this became O(N^2).
seq $n_pat | tr 0-9 A-J > in || framework_failure_
large_ms=$(LC_ALL=C user_time_ 1 grep --file=in empty) || fail=1
# Deliberately recording in an unused variable so it
# shows up in set -x output, in case this test fails.
ratio=$(expr "$large_ms" / "$small_ms")
# The duration of the latter run must be no more than 10 times
# that of the former. Using recent versions prior to this fix,
# this test would fail due to ratios > 800. Using the fixed version,
# it's common to see a ratio less than 1.
returns_ 1 expr $small_ms '<' $large_ms / 10 || fail=1
Exit $fail

View File

@ -2,7 +2,7 @@
# Make sure all of these programs work properly
# when invoked with --help or --version.
# Copyright (C) 2000-2018 Free Software Foundation, Inc.
# Copyright (C) 2000-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
#!/bin/sh
# Exercise high-bit-set unibyte-in-[...]-range bug.
# Copyright (C) 2011-2018 Free Software Foundation, Inc.
# Copyright (C) 2011-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -17,13 +17,13 @@ echo "$v" > out || framework_failure_
for arg in out - ''; do
# Accommodate both 'out' and '(standard input)', as well as
# the multi-byte quoting we see on OS/X-based systems.
echo grep: input file ... is also the output > err.exp || framework_failure_
echo grep: ...: input file is also the output > err.exp || framework_failure_
# Require an exit status of 2.
# grep-2.8 and earlier would infloop with $arg = out.
# grep-2.10 and earlier would infloop with $arg = - or $arg = ''.
timeout 10 grep 0 $arg < out >> out 2> err; st=$?; test $st = 2 || fail=1
sed 's/file .* is/file ... is/' err > k && mv k err
sed 's/grep: .*: /grep: ...: /' err > k && mv k err
# Normalize the diagnostic prefix from e.g., "/mnt/dir/grep: " to "grep: "
sed 's/^[^:]*: /grep: /' err > k && mv k err
compare err.exp err || fail=1

View File

@ -21,7 +21,6 @@ fi
vars_='
GREP_COLOR
GREP_COLORS
GREP_OPTIONS
TERM
'
envvar_check_fail=0
@ -43,14 +42,24 @@ require_timeout_()
|| skip_ your system lacks the timeout program
returns_ 1 timeout 10s false \
|| skip_ your system has a non-GNU timeout program
returns_ 124 timeout 0.01 sleep 0.02 \
|| skip_ "'timeout 0.01 sleep 0.02' did not time out"
}
require_pcre_()
{
echo . | grep -P . 2>err || {
test $? -eq 1 && fail_ PCRE available, but does not work.
skip_ no PCRE support
}
case $LC_ALL in
*.UTF-8)
printf '\303\241\n' | grep -P '^.$' 2>err || {
test $? -eq 1 && fail_ PCRE available, but does not work
skip_ no PCRE Unicode support
};;
*)
echo . | grep -P '^.$' 2>err || {
test $? -eq 1 && fail_ PCRE available, but does not work.
skip_ no PCRE support
};;
esac
compare /dev/null err || fail_ PCRE available, but stderr not empty.
}
@ -137,6 +146,13 @@ require_JP_EUC_locale_()
skip_ "$locale locale not found"
}
# Skip the current test if we lack Perl.
require_perl_()
{
test "$PERL" && $PERL -e 'use warnings' > /dev/null 2>&1 \
|| skip_ 'configure did not find a usable version of Perl'
}
expensive_()
{
if test "$RUN_EXPENSIVE_TESTS" != yes; then
@ -203,3 +219,17 @@ user_time_()
# yes is not portable, fake it with $AWK
yes() { line=${*-y} ${AWK-awk} 'BEGIN{for (;;) print ENVIRON["line"]}'; }
# Some systems lack seq.
# A limited replacement for seq: handle 1 or 2 args; increment must be 1
if ! type seq > /dev/null 2>&1; then
seq()
{
case $# in
1) start=1 final=$1;;
2) start=$1 final=$2;;
*) echo you lose 1>&2; exit 1;;
esac
awk 'BEGIN{for(i='$start';i<='$final';i++) print i}' < /dev/null
}
fi

View File

@ -1,618 +0,0 @@
# source this file; set up for tests
# Copyright (C) 2009-2018 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# Using this file in a test
# =========================
#
# The typical skeleton of a test looks like this:
#
# #!/bin/sh
# . "${srcdir=.}/init.sh"; path_prepend_ .
# Execute some commands.
# Note that these commands are executed in a subdirectory, therefore you
# need to prepend "../" to relative filenames in the build directory.
# Note that the "path_prepend_ ." is useful only if the body of your
# test invokes programs residing in the initial directory.
# For example, if the programs you want to test are in src/, and this test
# script is named tests/test-1, then you would use "path_prepend_ ../src",
# or perhaps export PATH='$(abs_top_builddir)/src$(PATH_SEPARATOR)'"$$PATH"
# to all tests via automake's TESTS_ENVIRONMENT.
# Set the exit code 0 for success, 77 for skipped, or 1 or other for failure.
# Use the skip_ and fail_ functions to print a diagnostic and then exit
# with the corresponding exit code.
# Exit $?
# Executing a test that uses this file
# ====================================
#
# Running a single test:
# $ make check TESTS=test-foo.sh
#
# Running a single test, with verbose output:
# $ make check TESTS=test-foo.sh VERBOSE=yes
#
# Running a single test, keeping the temporary directory:
# $ make check TESTS=test-foo.sh KEEP=yes
#
# Running a single test, with single-stepping:
# 1. Go into a sub-shell:
# $ bash
# 2. Set relevant environment variables from TESTS_ENVIRONMENT in the
# Makefile:
# $ export srcdir=../../tests # this is an example
# 3. Execute the commands from the test, copy&pasting them one by one:
# $ . "$srcdir/init.sh"; path_prepend_ .
# ...
# 4. Finally
# $ exit
ME_=`expr "./$0" : '.*/\(.*\)$'`
# Prepare PATH_SEPARATOR.
# The user is always right.
if test "${PATH_SEPARATOR+set}" != set; then
# Determine PATH_SEPARATOR by trying to find /bin/sh in a PATH which
# contains only /bin. Note that ksh looks also at the FPATH variable,
# so we have to set that as well for the test.
PATH_SEPARATOR=:
(PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 \
&& { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 \
|| PATH_SEPARATOR=';'
}
fi
# We use a trap below for cleanup. This requires us to go through
# hoops to get the right exit status transported through the handler.
# So use 'Exit STATUS' instead of 'exit STATUS' inside of the tests.
# Turn off errexit here so that we don't trip the bug with OSF1/Tru64
# sh inside this function.
Exit () { set +e; (exit $1); exit $1; }
# Print warnings (e.g., about skipped and failed tests) to this file number.
# Override by defining to say, 9, in init.cfg, and putting say,
# export ...ENVVAR_SETTINGS...; $(SHELL) 9>&2
# in the definition of TESTS_ENVIRONMENT in your tests/Makefile.am file.
# This is useful when using automake's parallel tests mode, to print
# the reason for skip/failure to console, rather than to the .log files.
: ${stderr_fileno_=2}
# Note that correct expansion of "$*" depends on IFS starting with ' '.
# Always write the full diagnostic to stderr.
# When stderr_fileno_ is not 2, also emit the first line of the
# diagnostic to that file descriptor.
warn_ ()
{
# If IFS does not start with ' ', set it and emit the warning in a subshell.
case $IFS in
' '*) printf '%s\n' "$*" >&2
test $stderr_fileno_ = 2 \
|| { printf '%s\n' "$*" | sed 1q >&$stderr_fileno_ ; } ;;
*) (IFS=' '; warn_ "$@");;
esac
}
fail_ () { warn_ "$ME_: failed test: $@"; Exit 1; }
skip_ () { warn_ "$ME_: skipped test: $@"; Exit 77; }
fatal_ () { warn_ "$ME_: hard error: $@"; Exit 99; }
framework_failure_ () { warn_ "$ME_: set-up failure: $@"; Exit 99; }
# This is used to simplify checking of the return value
# which is useful when ensuring a command fails as desired.
# I.e., just doing `command ... &&fail=1` will not catch
# a segfault in command for example. With this helper you
# instead check an explicit exit code like
# returns_ 1 command ... || fail
returns_ () {
# Disable tracing so it doesn't interfere with stderr of the wrapped command
{ set +x; } 2>/dev/null
local exp_exit="$1"
shift
"$@"
test $? -eq $exp_exit && ret_=0 || ret_=1
if test "$VERBOSE" = yes && test "$gl_set_x_corrupts_stderr_" = false; then
set -x
fi
{ return $ret_; } 2>/dev/null
}
# Sanitize this shell to POSIX mode, if possible.
DUALCASE=1; export DUALCASE
if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
emulate sh
NULLCMD=:
alias -g '${1+"$@"}'='"$@"'
setopt NO_GLOB_SUBST
else
case `(set -o) 2>/dev/null` in
*posix*) set -o posix ;;
esac
fi
# We require $(...) support unconditionally.
# We require non-surprising "local" semantics (this eliminates dash).
# This takes the admittedly draconian step of eliminating dash, because the
# assignment tab=$(printf '\t') works fine, yet preceding it with "local "
# transforms it into an assignment that sets the variable to the empty string.
# That is too counter-intuitive, and can lead to subtle run-time malfunction.
# The example below is less subtle in that with dash, it evokes the run-time
# exception "dash: 1: local: 1: bad variable name".
# We require a few additional shell features only when $EXEEXT is nonempty,
# in order to support automatic $EXEEXT emulation:
# - hyphen-containing alias names
# - we prefer to use ${var#...} substitution, rather than having
# to work around lack of support for that feature.
# The following code attempts to find a shell with support for these features.
# If the current shell passes the test, we're done. Otherwise, test other
# shells until we find one that passes. If one is found, re-exec it.
# If no acceptable shell is found, skip the current test.
#
# The "...set -x; P=1 true 2>err..." test is to disqualify any shell that
# emits "P=1" into err, as /bin/sh from SunOS 5.11 and OpenBSD 4.7 do.
#
# Use "9" to indicate success (rather than 0), in case some shell acts
# like Solaris 10's /bin/sh but exits successfully instead of with status 2.
# Eval this code in a subshell to determine a shell's suitability.
# 10 - passes all tests; ok to use
# 9 - ok, but enabling "set -x" corrupts app stderr; prefer higher score
# ? - not ok
gl_shell_test_script_='
test $(echo y) = y || exit 1
f_local_() { local v=1; }; f_local_ || exit 1
f_dash_local_fail_() { local t=$(printf " 1"); }; f_dash_local_fail_
score_=10
if test "$VERBOSE" = yes; then
test -n "$( (exec 3>&1; set -x; P=1 true 2>&3) 2> /dev/null)" && score_=9
fi
test -z "$EXEEXT" && exit $score_
shopt -s expand_aliases
alias a-b="echo zoo"
v=abx
test ${v%x} = ab \
&& test ${v#a} = bx \
&& test $(a-b) = zoo \
&& exit $score_
'
if test "x$1" = "x--no-reexec"; then
shift
else
# Assume a working shell. Export to subshells (setup_ needs this).
gl_set_x_corrupts_stderr_=false
export gl_set_x_corrupts_stderr_
# Record the first marginally acceptable shell.
marginal_=
# Search for a shell that meets our requirements.
for re_shell_ in __current__ "${CONFIG_SHELL:-no_shell}" \
/bin/sh bash dash zsh pdksh fail
do
test "$re_shell_" = no_shell && continue
# If we've made it all the way to the sentinel, "fail" without
# finding even a marginal shell, skip this test.
if test "$re_shell_" = fail; then
test -z "$marginal_" && skip_ failed to find an adequate shell
re_shell_=$marginal_
break
fi
# When testing the current shell, simply "eval" the test code.
# Otherwise, run it via $re_shell_ -c ...
if test "$re_shell_" = __current__; then
# 'eval'ing this code makes Solaris 10's /bin/sh exit with
# $? set to 2. It does not evaluate any of the code after the
# "unexpected" first '('. Thus, we must run it in a subshell.
( eval "$gl_shell_test_script_" ) > /dev/null 2>&1
else
"$re_shell_" -c "$gl_shell_test_script_" 2>/dev/null
fi
st_=$?
# $re_shell_ works just fine. Use it.
if test $st_ = 10; then
gl_set_x_corrupts_stderr_=false
break
fi
# If this is our first marginally acceptable shell, remember it.
if test "$st_:$marginal_" = 9: ; then
marginal_="$re_shell_"
gl_set_x_corrupts_stderr_=true
fi
done
if test "$re_shell_" != __current__; then
# Found a usable shell. Preserve -v and -x.
case $- in
*v*x* | *x*v*) opts_=-vx ;;
*v*) opts_=-v ;;
*x*) opts_=-x ;;
*) opts_= ;;
esac
re_shell=$re_shell_
export re_shell
exec "$re_shell_" $opts_ "$0" --no-reexec "$@"
echo "$ME_: exec failed" 1>&2
exit 127
fi
fi
# If this is bash, turn off all aliases.
test -n "$BASH_VERSION" && unalias -a
# Note that when supporting $EXEEXT (transparently mapping from PROG_NAME to
# PROG_NAME.exe), we want to support hyphen-containing names like test-acos.
# That is part of the shell-selection test above. Why use aliases rather
# than functions? Because support for hyphen-containing aliases is more
# widespread than that for hyphen-containing function names.
test -n "$EXEEXT" && test -n "$BASH_VERSION" && shopt -s expand_aliases
# Enable glibc's malloc-perturbing option.
# This is useful for exposing code that depends on the fact that
# malloc-related functions often return memory that is mostly zeroed.
# If you have the time and cycles, use valgrind to do an even better job.
: ${MALLOC_PERTURB_=87}
export MALLOC_PERTURB_
# This is a stub function that is run upon trap (upon regular exit and
# interrupt). Override it with a per-test function, e.g., to unmount
# a partition, or to undo any other global state changes.
cleanup_ () { :; }
# Emit a header similar to that from diff -u; Print the simulated "diff"
# command so that the order of arguments is clear. Don't bother with @@ lines.
emit_diff_u_header_ ()
{
printf '%s\n' "diff -u $*" \
"--- $1 1970-01-01" \
"+++ $2 1970-01-01"
}
# Arrange not to let diff or cmp operate on /dev/null,
# since on some systems (at least OSF/1 5.1), that doesn't work.
# When there are not two arguments, or no argument is /dev/null, return 2.
# When one argument is /dev/null and the other is not empty,
# cat the nonempty file to stderr and return 1.
# Otherwise, return 0.
compare_dev_null_ ()
{
test $# = 2 || return 2
if test "x$1" = x/dev/null; then
test -s "$2" || return 0
emit_diff_u_header_ "$@"; sed 's/^/+/' "$2"
return 1
fi
if test "x$2" = x/dev/null; then
test -s "$1" || return 0
emit_diff_u_header_ "$@"; sed 's/^/-/' "$1"
return 1
fi
return 2
}
for diff_opt_ in -u -U3 -c '' no; do
test "$diff_opt_" != no &&
diff_out_=`exec 2>/dev/null; diff $diff_opt_ "$0" "$0" < /dev/null` &&
break
done
if test "$diff_opt_" != no; then
if test -z "$diff_out_"; then
compare_ () { diff $diff_opt_ "$@"; }
else
compare_ ()
{
# If no differences were found, AIX and HP-UX 'diff' produce output
# like "No differences encountered". Hide this output.
diff $diff_opt_ "$@" > diff.out
diff_status_=$?
test $diff_status_ -eq 0 || cat diff.out || diff_status_=2
rm -f diff.out || diff_status_=2
return $diff_status_
}
fi
elif cmp -s /dev/null /dev/null 2>/dev/null; then
compare_ () { cmp -s "$@"; }
else
compare_ () { cmp "$@"; }
fi
# Usage: compare EXPECTED ACTUAL
#
# Given compare_dev_null_'s preprocessing, defer to compare_ if 2 or more.
# Otherwise, propagate $? to caller: any diffs have already been printed.
compare ()
{
# This looks like it can be factored to use a simple "case $?"
# after unchecked compare_dev_null_ invocation, but that would
# fail in a "set -e" environment.
if compare_dev_null_ "$@"; then
return 0
else
case $? in
1) return 1;;
*) compare_ "$@";;
esac
fi
}
# An arbitrary prefix to help distinguish test directories.
testdir_prefix_ () { printf gt; }
# Run the user-overridable cleanup_ function, remove the temporary
# directory and exit with the incoming value of $?.
remove_tmp_ ()
{
__st=$?
cleanup_
if test "$KEEP" = yes; then
echo "Not removing temporary directory $test_dir_"
else
# cd out of the directory we're about to remove
cd "$initial_cwd_" || cd / || cd /tmp
chmod -R u+rwx "$test_dir_"
# If removal fails and exit status was to be 0, then change it to 1.
rm -rf "$test_dir_" || { test $__st = 0 && __st=1; }
fi
exit $__st
}
# Given a directory name, DIR, if every entry in it that matches *.exe
# contains only the specified bytes (see the case stmt below), then print
# a space-separated list of those names and return 0. Otherwise, don't
# print anything and return 1. Naming constraints apply also to DIR.
find_exe_basenames_ ()
{
feb_dir_=$1
feb_fail_=0
feb_result_=
feb_sp_=
for feb_file_ in $feb_dir_/*.exe; do
# If there was no *.exe file, or there existed a file named "*.exe" that
# was deleted between the above glob expansion and the existence test
# below, just skip it.
test "x$feb_file_" = "x$feb_dir_/*.exe" && test ! -f "$feb_file_" \
&& continue
# Exempt [.exe, since we can't create a function by that name, yet
# we can't invoke [ by PATH search anyways due to shell builtins.
test "x$feb_file_" = "x$feb_dir_/[.exe" && continue
case $feb_file_ in
*[!-a-zA-Z/0-9_.+]*) feb_fail_=1; break;;
*) # Remove leading file name components as well as the .exe suffix.
feb_file_=${feb_file_##*/}
feb_file_=${feb_file_%.exe}
feb_result_="$feb_result_$feb_sp_$feb_file_";;
esac
feb_sp_=' '
done
test $feb_fail_ = 0 && printf %s "$feb_result_"
return $feb_fail_
}
# Consider the files in directory, $1.
# For each file name of the form PROG.exe, create an alias named
# PROG that simply invokes PROG.exe, then return 0. If any selected
# file name or the directory name, $1, contains an unexpected character,
# define no alias and return 1.
create_exe_shims_ ()
{
case $EXEEXT in
'') return 0 ;;
.exe) ;;
*) echo "$0: unexpected \$EXEEXT value: $EXEEXT" 1>&2; return 1 ;;
esac
base_names_=`find_exe_basenames_ $1` \
|| { echo "$0 (exe_shim): skipping directory: $1" 1>&2; return 0; }
if test -n "$base_names_"; then
for base_ in $base_names_; do
alias "$base_"="$base_$EXEEXT"
done
fi
return 0
}
# Use this function to prepend to PATH an absolute name for each
# specified, possibly-$initial_cwd_-relative, directory.
path_prepend_ ()
{
while test $# != 0; do
path_dir_=$1
case $path_dir_ in
'') fail_ "invalid path dir: '$1'";;
/* | ?:*) abs_path_dir_=$path_dir_;;
*) abs_path_dir_=$initial_cwd_/$path_dir_;;
esac
case $abs_path_dir_ in
*$PATH_SEPARATOR*) fail_ "invalid path dir: '$abs_path_dir_'";;
esac
PATH="$abs_path_dir_$PATH_SEPARATOR$PATH"
# Create an alias, FOO, for each FOO.exe in this directory.
create_exe_shims_ "$abs_path_dir_" \
|| fail_ "something failed (above): $abs_path_dir_"
shift
done
export PATH
}
setup_ ()
{
if test "$VERBOSE" = yes; then
# Test whether set -x may cause the selected shell to corrupt an
# application's stderr. Many do, including zsh-4.3.10 and the /bin/sh
# from SunOS 5.11, OpenBSD 4.7 and Irix 5.x and 6.5.
# If enabling verbose output this way would cause trouble, simply
# issue a warning and refrain.
if $gl_set_x_corrupts_stderr_; then
warn_ "using SHELL=$SHELL with 'set -x' corrupts stderr"
else
set -x
fi
fi
initial_cwd_=$PWD
pfx_=`testdir_prefix_`
test_dir_=`mktempd_ "$initial_cwd_" "$pfx_-$ME_.XXXX"` \
|| fail_ "failed to create temporary directory in $initial_cwd_"
cd "$test_dir_" || fail_ "failed to cd to temporary directory"
# As autoconf-generated configure scripts do, ensure that IFS
# is defined initially, so that saving and restoring $IFS works.
gl_init_sh_nl_='
'
IFS=" "" $gl_init_sh_nl_"
# This trap statement, along with a trap on 0 below, ensure that the
# temporary directory, $test_dir_, is removed upon exit as well as
# upon receipt of any of the listed signals.
for sig_ in 1 2 3 13 15; do
eval "trap 'Exit $(expr $sig_ + 128)' $sig_"
done
}
# Create a temporary directory, much like mktemp -d does.
# Written by Jim Meyering.
#
# Usage: mktempd_ /tmp phoey.XXXXXXXXXX
#
# First, try to use the mktemp program.
# Failing that, we'll roll our own mktemp-like function:
# - try to get random bytes from /dev/urandom
# - failing that, generate output from a combination of quickly-varying
# sources and gzip. Ignore non-varying gzip header, and extract
# "random" bits from there.
# - given those bits, map to file-name bytes using tr, and try to create
# the desired directory.
# - make only $MAX_TRIES_ attempts
# Helper function. Print $N pseudo-random bytes from a-zA-Z0-9.
rand_bytes_ ()
{
n_=$1
# Maybe try openssl rand -base64 $n_prime_|tr '+/=\012' abcd first?
# But if they have openssl, they probably have mktemp, too.
chars_=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
dev_rand_=/dev/urandom
if test -r "$dev_rand_"; then
# Note: 256-length($chars_) == 194; 3 copies of $chars_ is 186 + 8 = 194.
dd ibs=$n_ count=1 if=$dev_rand_ 2>/dev/null \
| LC_ALL=C tr -c $chars_ 01234567$chars_$chars_$chars_
return
fi
n_plus_50_=`expr $n_ + 50`
cmds_='date; date +%N; free; who -a; w; ps auxww; ps -ef'
data_=` (eval "$cmds_") 2>&1 | gzip `
# Ensure that $data_ has length at least 50+$n_
while :; do
len_=`echo "$data_"|wc -c`
test $n_plus_50_ -le $len_ && break;
data_=` (echo "$data_"; eval "$cmds_") 2>&1 | gzip `
done
echo "$data_" \
| dd bs=1 skip=50 count=$n_ 2>/dev/null \
| LC_ALL=C tr -c $chars_ 01234567$chars_$chars_$chars_
}
mktempd_ ()
{
case $# in
2);;
*) fail_ "Usage: mktempd_ DIR TEMPLATE";;
esac
destdir_=$1
template_=$2
MAX_TRIES_=4
# Disallow any trailing slash on specified destdir:
# it would subvert the post-mktemp "case"-based destdir test.
case $destdir_ in
/ | //) destdir_slash_=$destdir;;
*/) fail_ "invalid destination dir: remove trailing slash(es)";;
*) destdir_slash_=$destdir_/;;
esac
case $template_ in
*XXXX) ;;
*) fail_ \
"invalid template: $template_ (must have a suffix of at least 4 X's)";;
esac
# First, try to use mktemp.
d=`unset TMPDIR; { mktemp -d -t -p "$destdir_" "$template_"; } 2>/dev/null` &&
# The resulting name must be in the specified directory.
case $d in "$destdir_slash_"*) :;; *) false;; esac &&
# It must have created the directory.
test -d "$d" &&
# It must have 0700 permissions. Handle sticky "S" bits.
perms=`ls -dgo "$d" 2>/dev/null` &&
case $perms in drwx--[-S]---*) :;; *) false;; esac && {
echo "$d"
return
}
# If we reach this point, we'll have to create a directory manually.
# Get a copy of the template without its suffix of X's.
base_template_=`echo "$template_"|sed 's/XX*$//'`
# Calculate how many X's we've just removed.
template_length_=`echo "$template_" | wc -c`
nx_=`echo "$base_template_" | wc -c`
nx_=`expr $template_length_ - $nx_`
err_=
i_=1
while :; do
X_=`rand_bytes_ $nx_`
candidate_dir_="$destdir_slash_$base_template_$X_"
err_=`mkdir -m 0700 "$candidate_dir_" 2>&1` \
&& { echo "$candidate_dir_"; return; }
test $MAX_TRIES_ -le $i_ && break;
i_=`expr $i_ + 1`
done
fail_ "$err_"
}
# If you want to override the testdir_prefix_ function,
# or to add more utility functions, use this file.
test -f "$srcdir/init.cfg" \
&& . "$srcdir/init.cfg"
setup_ "$@"
# This trap is here, rather than in the setup_ function, because some
# shells run the exit trap at shell function exit, rather than script exit.
trap remove_tmp_ 0

View File

@ -1,7 +1,7 @@
#!/bin/sh
# Exercise -T.
# Copyright 2016-2018 Free Software Foundation, Inc.
# Copyright 2016-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -24,12 +24,10 @@ else
test $status -eq 2
fi || fail=1
echo 'Binary file input matches' >binary-file-matches
LC_ALL=en_US.UTF-8 timeout 10 grep -F $(encode A) input > out
status=$?
if test $status -eq 0; then
compare binary-file-matches out
compare /dev/null out
elif test $status -eq 1; then
compare_dev_null_ /dev/null out
else

View File

@ -1,7 +1,7 @@
#! /bin/sh
# Regression test for GNU grep.
#
# Copyright (C) 2001, 2006, 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2001, 2006, 2009-2026 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright

View File

@ -2,7 +2,7 @@
# Evoke a segfault in a hard-to-reach code path of kwset.c.
# This bug affected grep versions 2.19 through 2.21.
#
# Copyright (C) 2015-2018 Free Software Foundation, Inc.
# Copyright (C) 2015-2026 Free Software Foundation, Inc.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -1,7 +1,7 @@
#!/bin/sh
# grep-2.21 would incur a 100x penalty for 10x increase in regexp length
# Copyright 2015-2018 Free Software Foundation, Inc.
# Copyright 2015-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -24,17 +24,30 @@ fail=0
# system load during the two test runs, so we'll mark it as
# "expensive", making it less likely to be run by regular users.
expensive_
require_perl_
echo x > in || framework_failure_
# We could use seq -s '' (avoiding the tr filter), but I
# suspect some version of seq does not honor that option.
# Note that we want 10x the byte count (not line count) in the larger file.
seq 10000 50000 | tr -d '\012' > r || framework_failure_
cat r r r r r r r r r r > re-10x || framework_failure_
mv r re || framework_failure_
base_ms=$(user_time_ 1 grep -f re in ) || fail=1
b10x_ms=$(user_time_ 1 grep -f re-10x in) || fail=1
returns_ 0 user_time_ 1 grep -f re in > base-ms \
|| framework_failure_ 'failed to compute baseline timing'
base_ms=$(cat base-ms)
# This test caused trouble on at least two types of fringe hosts: those
# with very little memory (a 1.5GB RAM Solaris host) and a Linux/s390x
# (emulated with qemu-system-s390x). The former became unusable due to
# mem requirements of the 2nd test, and the latter ended up taking >35x
# more time than the base case. Skipping this test for any system using
# more than this many milliseconds for the first case should avoid those
# false-positive failures while skipping the test on few other systems.
test 800 -lt "$base_ms" && skip_ "this base-case test took too long"
returns_ 0 user_time_ 1 grep -f re-10x in > b10x-ms \
|| framework_failure_ 'failed to compute 10x timing'
b10x_ms=$(cat b10x-ms)
# Increasing the length of the regular expression by a factor
# of 10 should cause no more than a 10x increase in duration.

80
tests/many-regex-performance Executable file
View File

@ -0,0 +1,80 @@
#!/bin/sh
# Test for this performance regression:
# grep-3.4 would require O(N^2) RSS for N regexps
# grep-3.5 requires O(N) in the most common cases.
# Copyright 2020-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
. "${srcdir=.}/init.sh"; path_prepend_ ../src
fail=0
# This test is susceptible to failure due to differences in
# system load during the two test runs, so we'll mark it as
# "expensive", making it less likely to be run by regular users.
expensive_
require_perl_
# Make the quick/small input large enough so that even on high-end
# systems this first invocation takes at least 10ms of user time.
word_list=/usr/share/dict/linux.words
# If $word_list does not exist, generate an input that exhibits
# similar performance characteristics.
if ! test -f $word_list; then
# Generate data comparable to that word list.
# Note how all "words" start with "a", and that there is
# a small percentage of lines with at least one "." metachar.
# This requires /dev/urandom, so if it's not present, skip
# this test. If desperate, we could fall back to using
# tar+compressed lib/*.c as the data source.
test -r /dev/urandom \
|| skip_ 'this system has neither word list nor working /dev/urandom'
word_list=word_list
( echo a; cat /dev/urandom \
| LC_ALL=C tr -dc 'a-zA-Z0-9_' \
| head -c500000 \
| sed 's/\(........\)/\1\n/g' \
| sed s/rs/./ \
| sed s/./a/ \
| sort \
) > $word_list
fi
n_lines=2000
while :; do
sed ${n_lines}q < $word_list > in || framework_failure_
small_ms=$(LC_ALL=C user_time_ 1 grep --file=in -v in) || fail=1
test $small_ms -ge 10 && break
n_lines=$(expr $n_lines + 2000)
done
# Now, run it again, but with 20 times as many lines.
n_lines=$(expr $n_lines \* 20)
sed ${n_lines}q < $word_list > in || framework_failure_
large_ms=$(LC_ALL=C user_time_ 1 grep --file=in -v in) || fail=1
# Deliberately recording in an unused variable so it
# shows up in set -x output, in case this test fails.
ratio=$(expr "$large_ms" / "$small_ms")
# The duration of the larger run must be no more than 60 times
# that of the small one. Using recent versions prior to this fix,
# this test would fail due to ratios larger than 300. Using the
# fixed version, it's common to see a ratio of 20-30.
returns_ 1 expr $small_ms '<' $large_ms / 60 || fail=1
Exit $fail

View File

@ -3,7 +3,7 @@
# grep -F -x -o PAT print an extra newline for each match.
# This would fail for grep-2.19 and grep-2.20.
# Copyright 2014-2018 Free Software Foundation, Inc.
# Copyright 2014-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -12,4 +12,20 @@ echo x > exp || framework_failure_
yes x | timeout 10 grep -m1 x > out || fail=1
compare exp out || fail=1
# Make sure -m2 stops reading even when output is /dev/null.
# In grep 3.11, it would continue reading.
printf 'x\nx\nx\n' >in || framework_failure
(grep -m2 x >/dev/null && head -n1) <in >out || fail=1
compare exp out || fail=1
# The following two tests would fail before v3.11-70
echo x > in || framework_failure_
echo in > exp || framework_failure_
grep -l -m1 . in > out || fail=1
compare exp out || fail=1
# Ensure that this prints nothing and exits successfully.
grep -q -m1 . in > out || fail=1
compare /dev/null out || fail=1
Exit $fail

View File

@ -2,7 +2,7 @@
# Trigger a bug in the DFA matcher.
# This would fail for grep-2.20.
# Copyright 2014-2018 Free Software Foundation, Inc.
# Copyright 2014-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

View File

@ -2,7 +2,7 @@
# grep would sometimes read beyond end of input, when using a non-UTF8
# multibyte locale.
# Copyright 2014-2018 Free Software Foundation, Inc.
# Copyright 2014-2026 Free Software Foundation, Inc.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

Some files were not shown because too many files have changed in this diff Show More