mirror of
https://https.git.savannah.gnu.org/git/findutils.git
synced 2026-01-26 15:39:06 +00:00
updatedb: Remove support for the old pre-4.0 database format.
* locate/testsuite/Makefile.am (EXTRA_DIST_EXP): Remove locate.gnu/old_prefix.exp and locate.gnu/oldformat.exp. (EXTRA_DIST_XO): Remove locate.gnu/old_prefix.xo and locate.gnu/oldformat.xo. * doc/find.texi (Database Formats): Remove the warning about old versions of locate failing to read the LOCATE02 database format. Mention that the slocate database format is also supported. (Old Database Format): Point out that updatedb will no longer produce the old format. (Invoking updatedb): Remove mention of the --old-format option. Remove mention of --dbformat=old. (Long File Name Bugs with Old-Format Databases): Remove this section. * locate/updatedb.sh: remove support for --dbformat=old and --old-format. (checkbinary): Don't look for the bigram and code binaries. * locate/updatedb.1: Explain that support for the old database format has been removed from updatedb and will shortly be removed from locate also. Remove the documentation for the removed option --old-format and mention of --dbformat-old. * locate/code.c: remove since this program was only used to generate old-format databases. * locate/bigram.c: remove since this program was only used to generate old-format databases. * po/POTFILES.in: Remove bigram.c and code.c. * locate/word_io.c (putword): Remove this function, since it was only needed for making old-format databases. * find/find.1 (NON-BUGS): Don't mention bigram.c and code.c in the example. * locate/locatedb.h: Remove declaration of putword, which has been deleted. * locate/Makefile.am (libexec_PROGRAMS): Remove bigram and code (since they were only used to generate old-format databases). (updatedb): Don't substitute @bigram@ and @code@. (code_SOURCES): Delete. * locate/testsuite/locate.gnu/old_prefix.exp: delete test case for the old database format. * locate/testsuite/locate.gnu/old_prefix.xo: Likewise. * locate/testsuite/locate.gnu/oldformat.exp: Likewise. * locate/testsuite/locate.gnu/oldformat.xo: Likewise. * TODO: manpages for bigram and code are no longer needed. * NEWS: Mention these changes.
This commit is contained in:
parent
69e308b286
commit
89ec0211ce
6
NEWS
6
NEWS
@ -4,6 +4,12 @@ GNU findutils NEWS - User visible changes. -*- outline -*- (allout)
|
||||
|
||||
** Changes to locate / updatedb
|
||||
|
||||
Support for generating old-format databases (with updatedb
|
||||
--old-format or updatedb --dbformat=old) has been removed. The old
|
||||
database format was deprecated in 2007 (and updatedb has warned about
|
||||
this since that time). The locate program will will read old-format
|
||||
databases, though this support also will be removed.
|
||||
|
||||
The updatedb script now operates in the C locale only. This means
|
||||
that character encoding issues are now not likely to cause sort to
|
||||
fail. It also honours the TMPDIR environment variable if that was
|
||||
|
||||
2
TODO
2
TODO
@ -2,7 +2,7 @@
|
||||
* Internationalization
|
||||
** updatedb.sh should be internationalized
|
||||
|
||||
* man pages for frcode, bigram, and code
|
||||
* man page for frcode
|
||||
Perhaps a better description in texi pages as well.
|
||||
|
||||
* Add option for find to sort output in lexical order for use for updatedb
|
||||
|
||||
123
doc/find.texi
123
doc/find.texi
@ -2912,18 +2912,17 @@ directory trees when the databases were last updated. The file name
|
||||
database format changed starting with GNU @code{locate} version 4.0 to
|
||||
allow machines with different byte orderings to share the databases.
|
||||
|
||||
GNU @code{locate} can read both the old and new database formats.
|
||||
However, old versions of @code{locate} (on other Unix systems, or GNU
|
||||
@code{locate} before version 4.0) produce incorrect results if run
|
||||
against a database in something other than the old format.
|
||||
|
||||
Support for the old database format will eventually be discontinued,
|
||||
first in @code{updatedb} and later in @code{locate}.
|
||||
GNU @code{locate} can read both the old pre-findutils-4.0 database
|
||||
format and the @samp{LOCATE02} database format. Support for the old
|
||||
database format will shortly be removed from @code{locate}. It has
|
||||
already been removed from @code{updatedb}.
|
||||
|
||||
If you run @samp{locate --statistics}, the resulting summary indicates
|
||||
the type of each @code{locate} database. You select which database
|
||||
format @code{updatedb} will use with the @samp{--dbformat} option.
|
||||
|
||||
The @samp{slocate} database format is very similar to @samp{LOCATE02}
|
||||
and is also supported (in both @code{updatedb} and @code{locate}).
|
||||
|
||||
@menu
|
||||
* LOCATE02 Database Format::
|
||||
@ -3024,21 +3023,20 @@ interpreted as for the GNU LOCATE02 format.
|
||||
@subsection Old Database Format
|
||||
|
||||
The old database format is used by Unix @code{locate} and @code{find}
|
||||
programs and earlier releases of the GNU ones. @code{updatedb}
|
||||
produces this format if given the @samp{--old-format} option.
|
||||
programs and pre-4.0 releases of GNU findutils. @code{locate}
|
||||
understands this format, though @code{updatedb} will no longer produce
|
||||
it.
|
||||
|
||||
@code{updatedb} runs programs called @code{bigram} and @code{code} to
|
||||
produce old-format databases. The old format differs from the new one
|
||||
in the following ways. Instead of each entry starting with an
|
||||
offset-differential count byte and ending with a null, byte values
|
||||
from 0 through 28 indicate offset-differential counts from -14 through
|
||||
14. The byte value indicating that a long offset-differential count
|
||||
follows is 0x1e (30), not 0x80. The long counts are stored in host
|
||||
byte order, which is not necessarily network byte order, and host
|
||||
integer word size, which is usually 4 bytes. They also represent a
|
||||
count 14 less than their value. The database lines have no
|
||||
termination byte; the start of the next line is indicated by its first
|
||||
byte having a value <= 30.
|
||||
The old format differs from @samp{LOCATE02} in the following ways.
|
||||
Instead of each entry starting with an offset-differential count byte
|
||||
and ending with a null, byte values from 0 through 28 indicate
|
||||
offset-differential counts from -14 through 14. The byte value
|
||||
indicating that a long offset-differential count follows is 0x1e (30),
|
||||
not 0x80. The long counts are stored in host byte order, which is not
|
||||
necessarily network byte order, and host integer word size, which is
|
||||
usually 4 bytes. They also represent a count 14 less than their
|
||||
value. The database lines have no termination byte; the start of the
|
||||
next line is indicated by its first byte having a value <= 30.
|
||||
|
||||
In addition, instead of starting with a dummy entry, the old database
|
||||
format starts with a 256 byte table containing the 128 most common
|
||||
@ -3049,17 +3047,13 @@ offset-differential count coding makes these databases 20-25% smaller
|
||||
than the new format, but makes them not 8-bit clean. Any byte in a
|
||||
file name that is in the ranges used for the special codes is replaced
|
||||
in the database by a question mark, which not coincidentally is the
|
||||
shell wildcard to match a single character.
|
||||
shell wildcard to match a single character. The old format therefore
|
||||
cannot faithfully store entries with non-ASCII characters.
|
||||
|
||||
The old format therefore cannot faithfully store entries with
|
||||
non-ASCII characters. It therefore should not be used in
|
||||
internationalised environments. That is, most installations should
|
||||
not use it.
|
||||
|
||||
Because the long counts are stored by the @code{code} program as
|
||||
Because the long counts are stored as
|
||||
native-order machine words, the database format is not easily used in
|
||||
environments which differ in terms of byte order. If locate databases
|
||||
are to be shared between machines, the LOCATE02 database format should
|
||||
are to be shared between machines, the @samp{LOCATE02} database format should
|
||||
be used. This has other benefits as discussed above. However, the
|
||||
length of the filename currently being processed can normally be used
|
||||
to place reasonable limits on the long counts and so this information
|
||||
@ -3098,16 +3092,6 @@ the newline character, meaning that parts of file names containing
|
||||
newlines will be incorrectly sorted. This can result in both
|
||||
incorrect matches and incorrect failures to match.
|
||||
|
||||
On the other hand, if you are using the old database format, file
|
||||
names with embedded newlines are not correctly handled. There is no
|
||||
technical limitation which enforces this, it's just that the
|
||||
@code{bigram} program has not been updated to support lists of file
|
||||
names separated by nulls.
|
||||
|
||||
So, if you are using the new database format (this is the default) and
|
||||
your system uses GNU @code{sort}, newlines will be correctly handled
|
||||
at all times. Otherwise, newlines may not be correctly handled.
|
||||
|
||||
@node File Permissions
|
||||
@chapter File Permissions
|
||||
|
||||
@ -3631,24 +3615,12 @@ The user to search network directories as, using @code{su}. Default
|
||||
@code{user} is @code{daemon}. You can also use the environment variable
|
||||
@code{NETUSER} to set this user.
|
||||
|
||||
@item --old-format
|
||||
Generate a @code{locate} database in the old format, for compatibility
|
||||
with versions of @code{locate} other than GNU @code{locate}. Using
|
||||
this option means that @code{locate} will not be able to properly
|
||||
handle non-ASCII characters in file names (that is, file names
|
||||
containing characters which have the eighth bit set, such as many of
|
||||
the characters from the ISO-8859-1 character set). @xref{Database
|
||||
Formats}, for a detailed description of the supported database
|
||||
formats.
|
||||
|
||||
@item --dbformat=@var{FORMAT}
|
||||
Generate the locate database in format @code{FORMAT}. Supported
|
||||
database formats include @code{LOCATE02} (which is the default),
|
||||
@code{old} and @code{slocate}. The @code{old} format exists for
|
||||
compatibility with implementations of @code{locate} on other Unix
|
||||
systems. The @code{slocate} format exists for compatibility with
|
||||
@code{slocate}. @xref{Database Formats}, for a detailed description
|
||||
of each format.
|
||||
database formats include @code{LOCATE02} (which is the default) and
|
||||
@code{slocate}. The @code{slocate} format exists for compatibility
|
||||
with @code{slocate}. @xref{Database Formats}, for a detailed
|
||||
description of each format.
|
||||
|
||||
@item --help
|
||||
Print a summary of the command line usage and exit.
|
||||
@ -5377,47 +5349,6 @@ resolved by using @code{locate}'s @samp{-0} option, this still leaves
|
||||
the race condition problems associated with @samp{find @dots{} -print0}.
|
||||
There is no way to avoid these problems in the case of @code{locate}.
|
||||
|
||||
@subsection Long File Name Bugs with Old-Format Databases
|
||||
Old versions of @code{locate} have a bug in the way that old-format
|
||||
databases are read. This bug affects the following versions of
|
||||
@code{locate}:
|
||||
|
||||
@enumerate
|
||||
@item All releases prior to 4.2.31
|
||||
@item All 4.3.x releases prior to 4.3.7
|
||||
@end enumerate
|
||||
|
||||
The affected versions of @code{locate} read file names into a
|
||||
fixed-length 1026 byte buffer, allocated on the heap. This buffer is
|
||||
not extended if file names are too long to fit into the buffer. No
|
||||
range checking on the length of the filename is performed. This could
|
||||
in theory lead to a privilege escalation attack. Findutils versions
|
||||
4.3.0 to 4.3.6 are also affected.
|
||||
|
||||
On systems using the old database format and affected versions of
|
||||
@code{locate}, carefully-chosen long file names could in theory allow
|
||||
malicious users to run code of their choice as any user invoking
|
||||
locate.
|
||||
|
||||
If remote users can choose the names of files stored on your system,
|
||||
and these files are indexed by @code{updatedb}, this may be a remote
|
||||
security vulnerability. Findutils version 4.2.31 and findutils
|
||||
version 4.3.7 include fixes for this problem. The @code{updatedb},
|
||||
@code{bigram} and @code{code} programs do no appear to be affected.
|
||||
|
||||
If you are also using GNU coreutils, you can use the following command
|
||||
to determine the length of the longest file name on a given system:
|
||||
|
||||
@example
|
||||
find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
|
||||
@end example
|
||||
|
||||
Although this problem is significant, the old database format is not
|
||||
the default, and use of the old database format is not common. Most
|
||||
installations and most users will not be affected by this problem.
|
||||
|
||||
|
||||
|
||||
@node Security Summary
|
||||
@section Summary
|
||||
|
||||
|
||||
@ -2212,7 +2212,7 @@ resulting in
|
||||
actually receiving a command line like this:
|
||||
.nf
|
||||
.
|
||||
.B find . \-name bigram.c code.c frcode.c locate.c \-print
|
||||
.B find . \-name frcode.c locate.c word_io.c \-print
|
||||
.
|
||||
.fi
|
||||
That command is of course not going to work. Instead of doing things
|
||||
|
||||
@ -4,12 +4,9 @@ AM_CFLAGS = $(WARN_CFLAGS)
|
||||
LOCATE_DB = $(localstatedir)/locatedb
|
||||
localedir = $(datadir)/locale
|
||||
|
||||
AM_INSTALLCHECK_STD_OPTIONS_EXEMPT = \
|
||||
frcode$(EXEEXT) \
|
||||
code$(EXEEXT) \
|
||||
bigram$(EXEEXT)
|
||||
AM_INSTALLCHECK_STD_OPTIONS_EXEMPT = frcode$(EXEEXT)
|
||||
bin_PROGRAMS = locate
|
||||
libexec_PROGRAMS = frcode code bigram
|
||||
libexec_PROGRAMS = frcode
|
||||
bin_SCRIPTS = updatedb
|
||||
man_MANS = locate.1 updatedb.1 locatedb.5
|
||||
BUILT_SOURCES = dblocation.texi
|
||||
@ -18,7 +15,6 @@ CLEANFILES = updatedb
|
||||
|
||||
DISTCLEANFILES = dblocation.texi
|
||||
locate_SOURCES = locate.c word_io.c
|
||||
code_SOURCES = code.c word_io.c
|
||||
nodist_locate_TEXINFOS = dblocation.texi
|
||||
|
||||
AM_CPPFLAGS = -I$(top_srcdir)/lib -I../gl/lib -I$(top_srcdir)/gl/lib -DLOCATE_DB=\"$(LOCATE_DB)\" -DLOCALEDIR=\"$(localedir)\"
|
||||
@ -34,8 +30,6 @@ updatedb: updatedb.sh Makefile
|
||||
rm -f $@
|
||||
find=`echo find|sed '$(transform)'`; \
|
||||
frcode=`echo frcode|sed '$(transform)'`; \
|
||||
bigram=`echo bigram|sed '$(transform)'`; \
|
||||
code=`echo code|sed '$(transform)'`; \
|
||||
sed \
|
||||
-e "s,@""bindir""@,$(bindir)," \
|
||||
-e "s,@""libexecdir""@,$(libexecdir)," \
|
||||
@ -44,8 +38,6 @@ updatedb: updatedb.sh Makefile
|
||||
-e "s,@""PACKAGE_NAME""@,$(PACKAGE_NAME)," \
|
||||
-e "s,@""find""@,$${find}," \
|
||||
-e "s,@""frcode""@,$${frcode}," \
|
||||
-e "s,@""bigram""@,$${bigram}," \
|
||||
-e "s,@""code""@,$${code}," \
|
||||
-e "s,@""SORT""@,$(SORT)," \
|
||||
-e "s,@""SORT_SUPPORTS_Z""@,$(SORT_SUPPORTS_Z)," \
|
||||
$(srcdir)/updatedb.sh > $@
|
||||
|
||||
140
locate/bigram.c
140
locate/bigram.c
@ -1,140 +0,0 @@
|
||||
/* bigram -- list bigrams for locate
|
||||
Copyright (C) 1994, 2007, 2009-2011, 2016 Free Software Foundation,
|
||||
Inc.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
/* Usage: bigram < text > bigrams
|
||||
Use `code' to encode a file using this output.
|
||||
|
||||
Read a file from stdin and write out the bigrams (pairs of
|
||||
adjacent characters), one bigram per line, to stdout. To reduce
|
||||
needless duplication in the output, it starts finding the
|
||||
bigrams on each input line at the character where that line
|
||||
first differs from the previous line (i.e., in the ASCII
|
||||
remainder). Therefore, the input should be sorted in order to
|
||||
get the least redundant output.
|
||||
|
||||
Written by James A. Woods <jwoods@adobe.com>.
|
||||
Modified by David MacKenzie <djm@gnu.ai.mit.edu>. */
|
||||
|
||||
/* config.h must always be included first. */
|
||||
#include <config.h>
|
||||
|
||||
/* system headers. */
|
||||
#include <errno.h>
|
||||
#include <stdio.h>
|
||||
#include <locale.h>
|
||||
#include <string.h>
|
||||
#include <stdlib.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
/* gnulib headers. */
|
||||
#include "closeout.h"
|
||||
#include "gettext.h"
|
||||
#include "progname.h"
|
||||
#include "xalloc.h"
|
||||
#include "error.h"
|
||||
|
||||
/* find headers would go here but we don't need any. */
|
||||
|
||||
|
||||
/* We use gettext because for example xmalloc may issue an error message. */
|
||||
#if ENABLE_NLS
|
||||
# include <libintl.h>
|
||||
# define _(Text) gettext (Text)
|
||||
#else
|
||||
# define _(Text) Text
|
||||
#define textdomain(Domain)
|
||||
#define bindtextdomain(Package, Directory)
|
||||
#endif
|
||||
|
||||
|
||||
/* Return the length of the longest common prefix of strings S1 and S2. */
|
||||
|
||||
static int
|
||||
prefix_length (char *s1, char *s2)
|
||||
{
|
||||
register char *start;
|
||||
|
||||
for (start = s1; *s1 == *s2 && *s1 != '\0'; s1++, s2++)
|
||||
;
|
||||
return s1 - start;
|
||||
}
|
||||
|
||||
int
|
||||
main (int argc, char **argv)
|
||||
{
|
||||
char *path; /* The current input entry. */
|
||||
char *oldpath; /* The previous input entry. */
|
||||
size_t pathsize, oldpathsize; /* Amounts allocated for them. */
|
||||
int line_len; /* Length of input line. */
|
||||
|
||||
if (argv[0])
|
||||
set_program_name (argv[0]);
|
||||
else
|
||||
set_program_name ("bigram");
|
||||
|
||||
#ifdef HAVE_SETLOCALE
|
||||
setlocale (LC_ALL, "");
|
||||
#endif
|
||||
bindtextdomain (PACKAGE, LOCALEDIR);
|
||||
textdomain (PACKAGE);
|
||||
|
||||
(void) argc;
|
||||
if (atexit (close_stdout))
|
||||
{
|
||||
error (EXIT_FAILURE, errno, _("The atexit library function failed"));
|
||||
}
|
||||
|
||||
pathsize = oldpathsize = 1026; /* Increased as necessary by getline. */
|
||||
path = xmalloc (pathsize);
|
||||
oldpath = xmalloc (oldpathsize);
|
||||
|
||||
/* Set to empty string, to force the first prefix count to 0. */
|
||||
oldpath[0] = '\0';
|
||||
|
||||
while ((line_len = getline (&path, &pathsize, stdin)) > 0)
|
||||
{
|
||||
register int count; /* The prefix length. */
|
||||
register int j; /* Index into input line. */
|
||||
|
||||
path[line_len - 1] = '\0'; /* Remove the newline. */
|
||||
|
||||
/* Output bigrams in the remainder only. */
|
||||
count = prefix_length (oldpath, path);
|
||||
for (j = count; path[j] != '\0' && path[j + 1] != '\0'; j += 2)
|
||||
{
|
||||
putchar (path[j]);
|
||||
putchar (path[j + 1]);
|
||||
putchar ('\n');
|
||||
}
|
||||
|
||||
{
|
||||
/* Swap path and oldpath and their sizes. */
|
||||
char *tmppath = oldpath;
|
||||
size_t tmppathsize = oldpathsize;
|
||||
oldpath = path;
|
||||
oldpathsize = pathsize;
|
||||
path = tmppath;
|
||||
pathsize = tmppathsize;
|
||||
}
|
||||
}
|
||||
|
||||
free (path);
|
||||
free (oldpath);
|
||||
|
||||
return 0;
|
||||
}
|
||||
285
locate/code.c
285
locate/code.c
@ -1,285 +0,0 @@
|
||||
/* code -- bigram- and front-encode filenames for locate
|
||||
Copyright (C) 1994, 2005, 2007-2008, 2010-2011, 2016 Free Software
|
||||
Foundation, Inc.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
/* Compress a sorted list.
|
||||
Works with `find' to encode a filename database to save space
|
||||
and search time.
|
||||
|
||||
Usage:
|
||||
|
||||
bigram < file_list > bigrams
|
||||
process-bigrams > most_common_bigrams
|
||||
code most_common_bigrams < file_list > squeezed_list
|
||||
|
||||
Uses `front compression' (see ";login:", March 1983, p. 8).
|
||||
The output begins with the 128 most common bigrams.
|
||||
After that, the output format is, for each line,
|
||||
an offset (from the previous line) differential count byte
|
||||
followed by a (partially bigram-encoded) ASCII remainder.
|
||||
The output lines have no terminating byte; the start of the next line
|
||||
is indicated by its first byte having a value <= 30.
|
||||
|
||||
The encoding of the output bytes is:
|
||||
|
||||
0-28 likeliest differential counts + offset (14) to make nonnegative
|
||||
30 escape code for out-of-range count to follow in next halfword
|
||||
128-255 bigram codes (the 128 most common, as determined by `updatedb')
|
||||
32-127 single character (printable) ASCII remainder
|
||||
|
||||
Written by James A. Woods <jwoods@adobe.com>.
|
||||
Modified by David MacKenzie <djm@gnu.org>. */
|
||||
|
||||
/* config.h should always be included first. */
|
||||
#include <config.h>
|
||||
|
||||
/* system headers. */
|
||||
#include <errno.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
/* gnulib headers. */
|
||||
#include "closeout.h"
|
||||
#include "error.h"
|
||||
#include "gettext.h"
|
||||
#include "progname.h"
|
||||
#include "xalloc.h"
|
||||
|
||||
/* find headers. */
|
||||
#include "findutils-version.h"
|
||||
#include "locatedb.h"
|
||||
|
||||
#if ENABLE_NLS
|
||||
# include <libintl.h>
|
||||
# define _(Text) gettext (Text)
|
||||
#else
|
||||
# define _(Text) Text
|
||||
#define textdomain(Domain)
|
||||
#define bindtextdomain(Package, Directory)
|
||||
#endif
|
||||
|
||||
|
||||
#ifndef ATTRIBUTE_NORETURN
|
||||
# define ATTRIBUTE_NORETURN __attribute__ ((__noreturn__))
|
||||
#endif
|
||||
|
||||
|
||||
/* The 128 most common bigrams in the file list, padded with NULs
|
||||
if there are fewer. */
|
||||
static char bigrams[257] = {0};
|
||||
|
||||
/* Return the offset of PATTERN in STRING, or -1 if not found. */
|
||||
|
||||
static int
|
||||
strindex (char *string, char *pattern)
|
||||
{
|
||||
register char *s;
|
||||
|
||||
for (s = string; *s != '\0'; s++)
|
||||
/* Fast first char check. */
|
||||
if (*s == *pattern)
|
||||
{
|
||||
register char *p2 = pattern + 1, *s2 = s + 1;
|
||||
while (*p2 != '\0' && *p2 == *s2)
|
||||
p2++, s2++;
|
||||
if (*p2 == '\0')
|
||||
return s2 - strlen (pattern) - string;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
/* Return the length of the longest common prefix of strings S1 and S2. */
|
||||
|
||||
static int
|
||||
prefix_length (char *s1, char *s2)
|
||||
{
|
||||
register char *start;
|
||||
|
||||
for (start = s1; *s1 == *s2 && *s1 != '\0'; s1++, s2++)
|
||||
;
|
||||
return s1 - start;
|
||||
}
|
||||
|
||||
extern char *version_string;
|
||||
|
||||
static void
|
||||
usage (FILE *stream)
|
||||
{
|
||||
fprintf (stream, _("\
|
||||
Usage: %s [--version | --help]\n\
|
||||
or %s most_common_bigrams < file-list > locate-database\n"),
|
||||
program_name, program_name);
|
||||
fputs (_("\nReport bugs to <bug-findutils@gnu.org>.\n"), stream);
|
||||
}
|
||||
|
||||
|
||||
static void inerr (const char *filename) ATTRIBUTE_NORETURN;
|
||||
static void outerr (void) ATTRIBUTE_NORETURN;
|
||||
|
||||
static void
|
||||
inerr (const char *filename)
|
||||
{
|
||||
error (EXIT_FAILURE, errno, "%s", filename);
|
||||
/*NOTREACHED*/
|
||||
abort ();
|
||||
}
|
||||
|
||||
static void
|
||||
outerr (void)
|
||||
{
|
||||
error (EXIT_FAILURE, errno, _("write error"));
|
||||
/*NOTREACHED*/
|
||||
abort ();
|
||||
}
|
||||
|
||||
|
||||
int
|
||||
main (int argc, char **argv)
|
||||
{
|
||||
char *path; /* The current input entry. */
|
||||
char *oldpath; /* The previous input entry. */
|
||||
size_t pathsize, oldpathsize; /* Amounts allocated for them. */
|
||||
int count, oldcount, diffcount; /* Their prefix lengths & the difference. */
|
||||
char bigram[3]; /* Bigram to search for in table. */
|
||||
int code; /* Index of `bigram' in bigrams table. */
|
||||
FILE *fp; /* Most common bigrams file. */
|
||||
int line_len; /* Length of input line. */
|
||||
|
||||
set_program_name (argv[0]);
|
||||
if (atexit (close_stdout))
|
||||
{
|
||||
error (EXIT_FAILURE, errno, _("The atexit library function failed"));
|
||||
}
|
||||
|
||||
bigram[2] = '\0';
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
usage (stderr);
|
||||
return 2;
|
||||
}
|
||||
|
||||
if (0 == strcmp (argv[1], "--help"))
|
||||
{
|
||||
usage (stdout);
|
||||
return 0;
|
||||
}
|
||||
else if (0 == strcmp (argv[1], "--version"))
|
||||
{
|
||||
display_findutils_version ("code");
|
||||
return 0;
|
||||
}
|
||||
|
||||
fp = fopen (argv[1], "r");
|
||||
if (fp == NULL)
|
||||
{
|
||||
fprintf (stderr, "%s: ", argv[0]);
|
||||
perror (argv[1]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
pathsize = oldpathsize = 1026; /* Increased as necessary by getline. */
|
||||
path = xmalloc (pathsize);
|
||||
oldpath = xmalloc (oldpathsize);
|
||||
|
||||
/* Set to empty string, to force the first prefix count to 0. */
|
||||
oldpath[0] = '\0';
|
||||
oldcount = 0;
|
||||
|
||||
/* Copy the list of most common bigrams to the output,
|
||||
padding with NULs if there are <128 of them. */
|
||||
if (NULL == fgets (bigrams, 257, fp))
|
||||
inerr (argv[1]);
|
||||
|
||||
if (256 != fwrite (bigrams, 1, 256, stdout))
|
||||
outerr ();
|
||||
|
||||
if (EOF == fclose (fp))
|
||||
inerr (argv[1]);
|
||||
|
||||
while ((line_len = getline (&path, &pathsize, stdin)) > 0)
|
||||
{
|
||||
char *pp;
|
||||
|
||||
path[line_len - 1] = '\0'; /* Remove newline. */
|
||||
|
||||
/* Squelch unprintable chars in path so as not to botch decoding. */
|
||||
for (pp = path; *pp != '\0'; pp++)
|
||||
{
|
||||
if (!(*pp >= 040 && *pp < 0177))
|
||||
*pp = '?';
|
||||
}
|
||||
|
||||
count = prefix_length (oldpath, path);
|
||||
diffcount = count - oldcount;
|
||||
oldcount = count;
|
||||
/* If the difference is small, it fits in one byte;
|
||||
otherwise, two bytes plus a marker noting that fact. */
|
||||
if (diffcount < -LOCATEDB_OLD_OFFSET || diffcount > LOCATEDB_OLD_OFFSET)
|
||||
{
|
||||
if (EOF ==- putc (LOCATEDB_OLD_ESCAPE, stdout))
|
||||
outerr ();
|
||||
|
||||
if (!putword (stdout,
|
||||
diffcount+LOCATEDB_OLD_OFFSET,
|
||||
GetwordEndianStateNative))
|
||||
outerr ();
|
||||
}
|
||||
else
|
||||
{
|
||||
if (EOF == putc (diffcount + LOCATEDB_OLD_OFFSET, stdout))
|
||||
outerr ();
|
||||
}
|
||||
|
||||
/* Look for bigrams in the remainder of the path. */
|
||||
for (pp = path + count; *pp != '\0'; pp += 2)
|
||||
{
|
||||
if (pp[1] == '\0')
|
||||
{
|
||||
/* No bigram is possible; only one char is left. */
|
||||
putchar (*pp);
|
||||
break;
|
||||
}
|
||||
bigram[0] = *pp;
|
||||
bigram[1] = pp[1];
|
||||
/* Linear search for specific bigram in string table. */
|
||||
code = strindex (bigrams, bigram);
|
||||
if (code % 2 == 0)
|
||||
putchar ((code / 2) | 0200); /* It's a common bigram. */
|
||||
else
|
||||
fputs (bigram, stdout); /* Write the text as printable ASCII. */
|
||||
}
|
||||
|
||||
{
|
||||
/* Swap path and oldpath and their sizes. */
|
||||
char *tmppath = oldpath;
|
||||
size_t tmppathsize = oldpathsize;
|
||||
oldpath = path;
|
||||
oldpathsize = pathsize;
|
||||
path = tmppath;
|
||||
pathsize = tmppathsize;
|
||||
}
|
||||
}
|
||||
|
||||
free (path);
|
||||
free (oldpath);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -63,10 +63,6 @@ int getword (FILE *fp, const char *filename,
|
||||
size_t maxvalue,
|
||||
GetwordEndianState *endian_state_flag);
|
||||
|
||||
bool putword (FILE *fp, int word,
|
||||
GetwordEndianState endian_state_flag);
|
||||
|
||||
|
||||
#define SLOCATE_DB_MAGIC_LEN 2
|
||||
|
||||
#endif /* !INC_LOCATEDB_H */
|
||||
|
||||
@ -41,8 +41,6 @@ locate.gnu/slocate.exp \
|
||||
locate.gnu/notexists1.exp \
|
||||
locate.gnu/notexists2.exp \
|
||||
locate.gnu/notexists3.exp \
|
||||
locate.gnu/old_prefix.exp \
|
||||
locate.gnu/oldformat.exp \
|
||||
locate.gnu/space1st.exp \
|
||||
locate.gnu/sv-bug-14535.exp \
|
||||
locate.gnu/exceedshort.exp
|
||||
@ -63,9 +61,7 @@ locate.gnu/exists3.xo \
|
||||
locate.gnu/slocate.xo \
|
||||
locate.gnu/notexists1.xo \
|
||||
locate.gnu/notexists2.xo \
|
||||
locate.gnu/notexists3.xo \
|
||||
locate.gnu/old_prefix.xo \
|
||||
locate.gnu/oldformat.xo
|
||||
locate.gnu/notexists3.xo
|
||||
|
||||
EXTRA_DIST = $(EXTRA_DIST_EXP) $(EXTRA_DIST_XO) $(EXTRA_DIST_XI)
|
||||
|
||||
|
||||
@ -1,13 +0,0 @@
|
||||
set tmp "tmp"
|
||||
exec rm -rf $tmp
|
||||
exec mkdir $tmp
|
||||
exec mkdir $tmp/subdir
|
||||
exec touch $tmp/subdir/________________________________________________________________________________fred1
|
||||
exec touch $tmp/subdir/________________________________________________________________________________fred2
|
||||
exec touch $tmp/subdir/________________________________________________________________________________fred3
|
||||
exec touch $tmp/subdir/________________________________________________________________________________fred4
|
||||
|
||||
locate_start p "--changecwd=. --output=$tmp/locatedb --old-format --localpaths=tmp/subdir 2>/dev/null" "--database=$tmp/locatedb tmp" {}
|
||||
|
||||
|
||||
exec rm -rf $tmp
|
||||
@ -1,5 +0,0 @@
|
||||
tmp/subdir
|
||||
tmp/subdir/________________________________________________________________________________fred1
|
||||
tmp/subdir/________________________________________________________________________________fred2
|
||||
tmp/subdir/________________________________________________________________________________fred3
|
||||
tmp/subdir/________________________________________________________________________________fred4
|
||||
@ -1,12 +0,0 @@
|
||||
# A basic test for the old database format. We need this test because (among
|
||||
# other reasons) the updatedb script only uses our mktemp replacement when
|
||||
# it needs to run bigram/code.
|
||||
set tmp "tmp"
|
||||
exec rm -rf $tmp
|
||||
exec mkdir $tmp
|
||||
exec mkdir $tmp/subdir
|
||||
exec touch $tmp/subdir/fred
|
||||
# Redirect stderr to /dev/null to throw away the warning message about using
|
||||
# the old format, because otherwise the presence of the error message would
|
||||
# cause locate_start to signal a test case failure.
|
||||
locate_start p "--changecwd=. --output=$tmp/locatedb --old-format --localpaths=tmp/subdir/ 2>/dev/null" "--database=$tmp/locatedb -e fred" {}
|
||||
@ -1 +0,0 @@
|
||||
tmp/subdir/fred
|
||||
@ -26,19 +26,13 @@ Users can select which databases \fBlocate\fP searches using an
|
||||
environment variable or command line option; see \fBlocate\fP(1).
|
||||
Databases cannot be concatenated together.
|
||||
.P
|
||||
The file name database format changed starting with GNU
|
||||
.B find
|
||||
and
|
||||
The @samp{LOCATGE02} database format was introduced in GNU findutils
|
||||
version 4.0 in order to allow machines with different byte orderings
|
||||
to share the databases. GNU
|
||||
.B locate
|
||||
version 4.0 to allow machines with different byte orderings to share
|
||||
the databases. The new GNU
|
||||
.B locate
|
||||
can read both the old and new database formats.
|
||||
However, old versions of
|
||||
.B locate
|
||||
and
|
||||
.B find
|
||||
produce incorrect results if given a new-format database.
|
||||
can read both the old and @samp{LOCATE02} database formats, though
|
||||
support for the old pre-4.0 database format will be removed shortly.
|
||||
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
.B \-\-findoptions='\fI\-option1 \-option2...\fP'
|
||||
@ -88,16 +82,8 @@ The user to search network directories as, using \fBsu\fP(1).
|
||||
Default is \fBdaemon\fP.
|
||||
You can also use the environment variable \fBNETUSER\fP to set this user.
|
||||
.TP
|
||||
.B \-\-old\-format
|
||||
Create the database in the old format. This is a synonym for
|
||||
.BR \-\-dbformat=old .
|
||||
.TP
|
||||
.B \-\-dbformat=F
|
||||
Create the database in format F. The default format is called LOCATE02.
|
||||
F can be
|
||||
.B old
|
||||
to select the old database format (this is the same as specifying
|
||||
.BR \-\-old\-format ).
|
||||
Alternatively the
|
||||
.B slocate
|
||||
format is also supported. When the
|
||||
|
||||
@ -50,11 +50,11 @@ Usage: $0 [--findoptions='-option1 -option2...']
|
||||
[--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
|
||||
[--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
|
||||
[--output=dbfile] [--netuser=user] [--localuser=user]
|
||||
[--old-format] [--dbformat] [--version] [--help]
|
||||
[--dbformat] [--version] [--help]
|
||||
|
||||
Report bugs to <bug-findutils@gnu.org>."
|
||||
changeto=/
|
||||
old=no
|
||||
|
||||
for arg
|
||||
do
|
||||
# If we are unable to fork, the back-tick operator will
|
||||
@ -72,7 +72,6 @@ do
|
||||
--output) LOCATE_DB="$val" ;;
|
||||
--netuser) NETUSER="$val" ;;
|
||||
--localuser) LOCALUSER="$val" ;;
|
||||
--old-format) old=yes ;;
|
||||
--changecwd) changeto="$val" ;;
|
||||
--dbformat) dbformat="$val" ;;
|
||||
--version) fail=0; echo "$version" || fail=1; exit $fail ;;
|
||||
@ -83,51 +82,32 @@ $usage" >&2
|
||||
esac
|
||||
done
|
||||
|
||||
|
||||
|
||||
|
||||
case "${dbformat:+yes}_${old}" in
|
||||
yes_yes)
|
||||
echo "The --dbformat and --old-format cannot both be specified." >&2
|
||||
exit 1
|
||||
;;
|
||||
*)
|
||||
;;
|
||||
frcode_options=""
|
||||
case "$dbformat" in
|
||||
"")
|
||||
# Default, use LOCATE02
|
||||
;;
|
||||
LOCATE02)
|
||||
;;
|
||||
slocate)
|
||||
frcode_options="$frcode_options -S 1"
|
||||
;;
|
||||
*)
|
||||
# The "old" database format is no longer supported.
|
||||
echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
|
||||
echo "LOCATE02, slocate" >&2
|
||||
exit 1
|
||||
esac
|
||||
|
||||
if test "$old" = yes || test "$dbformat" = "old" ; then
|
||||
echo "Warning: future versions of findutils will shortly discontinue support for the old locate database format." >&2
|
||||
old=yes
|
||||
|
||||
if @SORT_SUPPORTS_Z@
|
||||
then
|
||||
sort="@SORT@ -z"
|
||||
print_option="-print0"
|
||||
frcode_options="$frcode_options -0"
|
||||
else
|
||||
sort="@SORT@"
|
||||
print_option="-print"
|
||||
frcode_options=""
|
||||
else
|
||||
frcode_options=""
|
||||
case "$dbformat" in
|
||||
"")
|
||||
# Default, use LOCATE02
|
||||
;;
|
||||
LOCATE02)
|
||||
;;
|
||||
slocate)
|
||||
frcode_options="$frcode_options -S 1"
|
||||
;;
|
||||
*)
|
||||
echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
|
||||
echo "LOCATE02, slocate, old" >&2
|
||||
exit 1
|
||||
esac
|
||||
|
||||
|
||||
if @SORT_SUPPORTS_Z@
|
||||
then
|
||||
sort="@SORT@ -z"
|
||||
print_option="-print0"
|
||||
frcode_options="$frcode_options -0"
|
||||
else
|
||||
sort="@SORT@"
|
||||
print_option="-print"
|
||||
fi
|
||||
fi
|
||||
|
||||
getuid() {
|
||||
@ -230,8 +210,6 @@ fi
|
||||
# The names of the utilities to run to build the database.
|
||||
: ${find:=${BINDIR}/@find@}
|
||||
: ${frcode:=${LIBEXECDIR}/@frcode@}
|
||||
: ${bigram:=${LIBEXECDIR}/@bigram@}
|
||||
: ${code:=${LIBEXECDIR}/@code@}
|
||||
|
||||
make_tempdir () {
|
||||
# This implementation is adapted from the GNU Autoconf manual.
|
||||
@ -263,7 +241,7 @@ checkbinary () {
|
||||
fi
|
||||
}
|
||||
|
||||
for binary in $find $frcode $bigram $code
|
||||
for binary in $find $frcode
|
||||
do
|
||||
checkbinary $binary
|
||||
done
|
||||
@ -303,8 +281,6 @@ fi
|
||||
rm -f $LOCATE_DB.n
|
||||
trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
|
||||
|
||||
if test $old = no; then
|
||||
# LOCATE02 or slocate format
|
||||
if {
|
||||
cd "$changeto"
|
||||
if test -n "$SEARCHPATHS"; then
|
||||
@ -356,73 +332,4 @@ else
|
||||
rm -f $LOCATE_DB.n
|
||||
fi
|
||||
|
||||
else # old
|
||||
|
||||
if temp_directory="`make_tempdir`"; then
|
||||
bigrams="${temp_directory}"/bigrams
|
||||
filelist="${temp_directory}"/filelist
|
||||
else
|
||||
echo "failed to create temporary directory" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
rm -f $LOCATE_DB.n
|
||||
trap 'rm -f $LOCATE_DB.n; rm -rf "${temp_directory}"; exit' HUP TERM
|
||||
|
||||
# Alphabetize subdirectories before file entries using tr. James Woods says:
|
||||
# "to get everything in monotonic collating sequence, to avoid some
|
||||
# breakage i'll have to think about."
|
||||
{
|
||||
cd "$changeto"
|
||||
if test -n "$SEARCHPATHS"; then
|
||||
if [ "$LOCALUSER" != "" ]; then
|
||||
# : A5
|
||||
su $LOCALUSER `select_shell $LOCALUSER` -c \
|
||||
"$find $SEARCHPATHS $FINDOPTIONS \
|
||||
\( $prunefs_exp \
|
||||
-type d -regex '$PRUNEREGEX' \) -prune -o $print_option" || exit $?
|
||||
else
|
||||
# : A6
|
||||
$find $SEARCHPATHS $FINDOPTIONS \
|
||||
\( $prunefs_exp \
|
||||
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option || exit $?
|
||||
fi
|
||||
fi
|
||||
|
||||
if test -n "$NETPATHS"; then
|
||||
myuid=`getuid`
|
||||
if [ "$myuid" = 0 ]; then
|
||||
# : A7
|
||||
su $NETUSER `select_shell $NETUSER` -c \
|
||||
"$find $NETPATHS $FINDOPTIONS \\( -type d -regex '$PRUNEREGEX' -prune \\) -o $print_option" ||
|
||||
exit $?
|
||||
else
|
||||
# : A8
|
||||
$find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o $print_option ||
|
||||
exit $?
|
||||
fi
|
||||
fi
|
||||
} | tr / '\001' | $sort | tr '\001' / > "$filelist"
|
||||
|
||||
# Compute the (at most 128) most common bigrams in the file list.
|
||||
$bigram $bigram_opts < $filelist | sort | uniq -c | sort -nr |
|
||||
awk '{ if (NR <= 128) print $2 }' | tr -d '\012' > "$bigrams"
|
||||
|
||||
# Code the file list.
|
||||
$code "$bigrams" < "$filelist" > $LOCATE_DB.n
|
||||
|
||||
rm -rf "${temp_directory}"
|
||||
|
||||
# To reduce the chances of breaking locate while this script is running,
|
||||
# put the results in a temp file, then rename it atomically.
|
||||
if test -s $LOCATE_DB.n; then
|
||||
chmod 644 ${LOCATE_DB}.n
|
||||
mv ${LOCATE_DB}.n $LOCATE_DB
|
||||
else
|
||||
echo "updatedb: new database would be empty" >&2
|
||||
rm -f $LOCATE_DB.n
|
||||
fi
|
||||
|
||||
fi
|
||||
|
||||
exit 0
|
||||
|
||||
@ -140,26 +140,3 @@ getword (FILE *fp,
|
||||
return decode_value (data, maxvalue, endian_state_flag, filename);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
putword (FILE *fp, int word,
|
||||
GetwordEndianState endian_state_flag)
|
||||
{
|
||||
size_t items_written;
|
||||
|
||||
/* You must decide before calling this function which
|
||||
* endianness you want to use.
|
||||
*/
|
||||
assert (endian_state_flag != GetwordEndianStateInitial);
|
||||
if (GetwordEndianStateSwab == endian_state_flag)
|
||||
{
|
||||
word = bswap_32(word);
|
||||
}
|
||||
|
||||
items_written = fwrite (&word, sizeof (word), 1, fp);
|
||||
if (1 == items_written)
|
||||
return true;
|
||||
else
|
||||
return false;
|
||||
}
|
||||
|
||||
@ -22,8 +22,6 @@ lib/findutils-version.c
|
||||
lib/listfile.c
|
||||
lib/regextype.c
|
||||
lib/safe-atoi.c
|
||||
locate/bigram.c
|
||||
locate/code.c
|
||||
locate/frcode.c
|
||||
locate/locate.c
|
||||
locate/word_io.c
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user