mirror of
https://https.git.savannah.gnu.org/git/findutils.git
synced 2026-01-26 15:39:06 +00:00
117 lines
4.7 KiB
Groff
117 lines
4.7 KiB
Groff
.TH LOCATEDB 5 \" -*- nroff -*-
|
|
.SH NAME
|
|
locatedb \- front-compressed file name database
|
|
.SH DESCRIPTION
|
|
This manual page documents the format of file name databases for the
|
|
GNU version of
|
|
.BR locate .
|
|
The file name databases contain lists of files that were in
|
|
particular directory trees when the databases were last updated.
|
|
.P
|
|
There can be multiple databases. Users can select which databases
|
|
\fBlocate\fP searches using an environment variable or command line
|
|
option; see \fBlocate\fP(1). The system administrator can choose the
|
|
file name of the default database, the frequency with which the
|
|
databases are updated, and the directories for which they contain
|
|
entries. Normally, file name databases are updated by running the
|
|
\fBupdatedb\fP program periodically, typically nightly; see
|
|
\fBupdatedb\fP(1).
|
|
.P
|
|
\fBupdatedb\fP runs a program called \fBfrcode\fP to compress the list
|
|
of file names using front-compression, which reduces
|
|
the database size by a factor of 4 to 5. Front-compression (also
|
|
known as incremental encoding) works as follows.
|
|
.P
|
|
The database entries are a sorted list (case-insensitively, for users'
|
|
convenience). Since the list is sorted, each entry is likely to share
|
|
a prefix (initial string) with the previous entry. Each database
|
|
entry begins with an offset-differential count byte, which is the
|
|
additional number of characters of prefix of the preceding entry to
|
|
use beyond the number that the preceding entry is using of its
|
|
predecessor. (The counts can be negative.) Following the count is a
|
|
null-terminated ASCII remainder \(em the part of the name that follows
|
|
the shared prefix.
|
|
.P
|
|
If the offset-differential count is larger than can be stored in a
|
|
byte (+/\-127), the byte has the value 0x80 and the count follows in a
|
|
2-byte word, with the high byte first (network byte order).
|
|
.P
|
|
Every database begins with a dummy entry for a file called `LOCATE02',
|
|
which \fBlocate\fP checks for to ensure that the database file has the
|
|
correct format; it ignores the entry in doing the search.
|
|
.P
|
|
Databases can not be concatenated together, even if the first
|
|
(dummy) entry is trimmed from all but the first database. This
|
|
is because the offset-differential count in the first entry of the
|
|
second and following databases will be wrong.
|
|
.P
|
|
There is also an old database format, used by Unix
|
|
.B locate
|
|
and
|
|
.B find
|
|
programs and earlier releases of the GNU ones. \fBupdatedb\fP runs
|
|
programs called \fBbigram\fP and \fBcode\fP to produce old-format
|
|
databases. The old format differs from the above description in the
|
|
following ways. Instead of each entry starting with an
|
|
offset-differential count byte and ending with a null, byte values
|
|
from 0 through 28 indicate offset-differential counts from -14 through
|
|
14. The byte value indicating that a long offset-differential count
|
|
follows is 0x1e (30), not 0x80. The long counts are stored in host
|
|
byte order, which is not necessarily network byte order, and host
|
|
integer word size, which is usually 4 bytes. They also represent a
|
|
count 14 less than their value. The database lines have no
|
|
termination byte; the start of the next line is indicated by its first
|
|
byte having a value <= 30.
|
|
.P
|
|
In addition, instead of starting with a dummy entry, the old database
|
|
format starts with a 256 byte table containing the 128 most common
|
|
bigrams in the file list. A bigram is a pair of adjacent bytes.
|
|
Bytes in the database that have the high bit set are indexes (with the
|
|
high bit cleared) into the bigram table. The bigram and
|
|
offset-differential count coding makes these databases 20-25% smaller
|
|
than the new format, but makes them not 8-bit clean. Any byte in a
|
|
file name that is in the ranges used for the special codes is replaced
|
|
in the database by a question mark, which not coincidentally is the
|
|
shell wildcard to match a single character.
|
|
.SH EXAMPLE
|
|
.nf
|
|
|
|
Input to \fBfrcode\fP:
|
|
.\" with nulls changed to newlines:
|
|
/usr/src
|
|
/usr/src/cmd/aardvark.c
|
|
/usr/src/cmd/armadillo.c
|
|
/usr/tmp/zoo
|
|
|
|
Length of the longest prefix of the preceding entry to share:
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
14 rmadillo.c
|
|
5 tmp/zoo
|
|
|
|
.fi
|
|
Output from \fBfrcode\fP, with trailing nulls changed to newlines
|
|
and count bytes made printable:
|
|
.nf
|
|
0 LOCATE02
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
6 rmadillo.c
|
|
\-9 tmp/zoo
|
|
|
|
(6 = 14 \- 8, and \-9 = 5 \- 14)
|
|
.fi
|
|
.SH "SEE ALSO"
|
|
\fBfind\fP(1), \fBlocate\fP(1), \fBlocatedb\fP(5), \fBxargs\fP(1)
|
|
\fBFinding Files\fP (on-line in Info, or printed)
|
|
.SH "BUGS"
|
|
.P
|
|
The best way to report a bug is to use the form at
|
|
http://savannah.gnu.org/bugs/?group=findutils.
|
|
The reason for this is that you will then be able to track progress in
|
|
fixing the problem. Other comments about \fBlocate\fP(1) and about
|
|
the findutils package in general can be sent to the
|
|
.I bug-findutils
|
|
mailing list. To join the list, send email to
|
|
.IR bug-findutils-request@gnu.org .
|