mirror of
https://https.git.savannah.gnu.org/git/findutils.git
synced 2026-01-26 07:37:52 +00:00
* locate/locate.c: typo fix: whcih * doc/find.texi: typo fix: futher * bootstrap.conf: spelling fix: consistant * NEWS: Mention this change. Copyright-paperwork-exempt: Yes
5713 lines
216 KiB
Plaintext
5713 lines
216 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename find.info
|
|
@settitle Finding Files
|
|
@c For double-sided printing, uncomment:
|
|
@c @setchapternewpage odd
|
|
@c %**end of header
|
|
|
|
@include version.texi
|
|
@include dblocation.texi
|
|
|
|
@iftex
|
|
@finalout
|
|
@end iftex
|
|
|
|
@dircategory Basics
|
|
@direntry
|
|
* Finding files: (find). Operating on files matching certain criteria.
|
|
@end direntry
|
|
|
|
@dircategory Individual utilities
|
|
@direntry
|
|
* find: (find)Invoking find. Finding and acting on files.
|
|
* locate: (find)Invoking locate. Finding files in a database.
|
|
* updatedb: (find)Invoking updatedb. Building the locate database.
|
|
* xargs: (find)Invoking xargs. Operating on many files.
|
|
@end direntry
|
|
|
|
@copying
|
|
|
|
This file documents the GNU utilities for finding files that match
|
|
certain criteria and performing various operations on them.
|
|
|
|
Copyright @copyright{} 1994, 1996, 1998, 2000, 2001, 2003-2015 Free
|
|
Software Foundation, Inc.
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with no
|
|
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
|
A copy of the license is included in the section entitled
|
|
``GNU Free Documentation License''.
|
|
@end copying
|
|
|
|
@titlepage
|
|
@title Finding Files
|
|
@subtitle Edition @value{EDITION}, for GNU @code{find} version @value{VERSION}
|
|
@subtitle @value{UPDATED}
|
|
@author by David MacKenzie and James Youngman
|
|
|
|
@page
|
|
@vskip 0pt plus 1filll
|
|
@insertcopying
|
|
@end titlepage
|
|
|
|
@contents
|
|
|
|
@ifnottex
|
|
@node Top
|
|
@top GNU Findutils
|
|
@comment node-name, next, previous, up
|
|
@insertcopying
|
|
|
|
This file documents the GNU utilities for finding files that match
|
|
certain criteria and performing various actions on them.
|
|
|
|
This is edition @value{EDITION}, for @code{find} version @value{VERSION}.
|
|
@end ifnottex
|
|
|
|
@c The master menu, created with texinfo-master-menu, goes here.
|
|
|
|
@menu
|
|
* Introduction:: Summary of the tasks this manual describes.
|
|
* Finding Files:: Finding files that match certain criteria.
|
|
* Actions:: Doing things to files you have found.
|
|
* Databases:: Maintaining file name databases.
|
|
* File Permissions:: How to control access to files.
|
|
* Date input formats:: Specifying literal times.
|
|
* Configuration:: Options you can select at compile time.
|
|
* Reference:: Summary of how to invoke the programs.
|
|
* Common Tasks:: Solutions to common real-world problems.
|
|
* Worked Examples:: Examples demonstrating more complex points.
|
|
* Security Considerations:: Security issues relating to findutils.
|
|
* Error Messages:: Explanations of some messages you might see.
|
|
* GNU Free Documentation License:: Copying and sharing this manual.
|
|
* Primary Index:: The components of @code{find} expressions.
|
|
@end menu
|
|
|
|
@node Introduction
|
|
@chapter Introduction
|
|
|
|
This manual shows how to find files that meet criteria you specify,
|
|
and how to perform various actions on the files that you find. The
|
|
principal programs that you use to perform these tasks are
|
|
@code{find}, @code{locate}, and @code{xargs}. Some of the examples in
|
|
this manual use capabilities specific to the GNU versions of those
|
|
programs.
|
|
|
|
GNU @code{find} was originally written by Eric Decker, with
|
|
enhancements by David MacKenzie, Jay Plett, and Tim Wood. GNU
|
|
@code{xargs} was originally written by Mike Rendell, with enhancements
|
|
by David MacKenzie. GNU @code{locate} and its associated utilities
|
|
were originally written by James Woods, with enhancements by David
|
|
MacKenzie. The idea for @samp{find -print0} and @samp{xargs -0} came
|
|
from Dan Bernstein. The current maintainer of GNU findutils (and this
|
|
manual) is James Youngman. Many other people have contributed bug
|
|
fixes, small improvements, and helpful suggestions. Thanks!
|
|
|
|
To report a bug in GNU findutils, please use the form on the Savannah
|
|
web site at
|
|
@code{http://savannah.gnu.org/bugs/?group=findutils}. Reporting bugs
|
|
this way means that you will then be able to track progress in fixing
|
|
the problem.
|
|
|
|
If you don't have web access, you can also just send mail to the
|
|
mailing list. The mailing list @email{bug-findutils@@gnu.org} carries
|
|
discussion of bugs in findutils, questions and answers about the
|
|
software and discussion of the development of the programs. To join
|
|
the list, send email to @email{bug-findutils-request@@gnu.org}.
|
|
|
|
Please read any relevant sections of this manual before asking for
|
|
help on the mailing list. You may also find it helpful to read the
|
|
NON-BUGS section of the @code{find} manual page.
|
|
|
|
If you ask for help on the mailing list, people will be able to help
|
|
you much more effectively if you include the following things:
|
|
|
|
@itemize @bullet
|
|
@item The version of the software you are running. You can find this
|
|
out by running @samp{locate --version}.
|
|
@item What you were trying to do
|
|
@item The @emph{exact} command line you used
|
|
@item The @emph{exact} output you got (if this is very long, try to
|
|
find a smaller example which exhibits the same problem)
|
|
@item The output you expected to get
|
|
@end itemize
|
|
|
|
It may also be the case that the bug you are describing has already
|
|
been fixed, if it is a bug. Please check the most recent findutils
|
|
releases at @url{ftp://ftp.gnu.org/gnu/findutils} and, if possible,
|
|
the development branch at @url{ftp://alpha.gnu.org/gnu/findutils}.
|
|
If you take the time to check that your bug still exists in current
|
|
releases, this will greatly help people who want to help you solve
|
|
your problem. Please also be aware that if you obtained findutils as
|
|
part of the GNU/Linux 'distribution', the distributions often lag
|
|
seriously behind findutils releases, even the stable release. Please
|
|
check the GNU FTP site.
|
|
|
|
@menu
|
|
* Scope::
|
|
* Overview::
|
|
* find Expressions::
|
|
@end menu
|
|
|
|
@node Scope
|
|
@section Scope
|
|
|
|
For brevity, the word @dfn{file} in this manual means a regular file,
|
|
a directory, a symbolic link, or any other kind of node that has a
|
|
directory entry. A directory entry is also called a @dfn{file name}.
|
|
A file name may contain some, all, or none of the directories in a
|
|
path that leads to the file. These are all examples of what this
|
|
manual calls ``file names'':
|
|
|
|
@example
|
|
parser.c
|
|
README
|
|
./budget/may-94.sc
|
|
fred/.cshrc
|
|
/usr/local/include/termcap.h
|
|
@end example
|
|
|
|
A @dfn{directory tree} is a directory and the files it contains, all
|
|
of its subdirectories and the files they contain, etc. It can also be
|
|
a single non-directory file.
|
|
|
|
These programs enable you to find the files in one or more directory
|
|
trees that:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
have names that contain certain text or match a certain pattern;
|
|
@item
|
|
are links to certain files;
|
|
@item
|
|
were last used during a certain period of time;
|
|
@item
|
|
are within a certain size range;
|
|
@item
|
|
are of a certain type (regular file, directory, symbolic link, etc.);
|
|
@item
|
|
are owned by a certain user or group;
|
|
@item
|
|
have certain access permissions or special mode bits;
|
|
@item
|
|
contain text that matches a certain pattern;
|
|
@item
|
|
are within a certain depth in the directory tree;
|
|
@item
|
|
or some combination of the above.
|
|
@end itemize
|
|
|
|
Once you have found the files you're looking for (or files that are
|
|
potentially the ones you're looking for), you can do more to them than
|
|
simply list their names. You can get any combination of the files'
|
|
attributes, or process the files in many ways, either individually or
|
|
in groups of various sizes. Actions that you might want to perform on
|
|
the files you have found include, but are not limited to:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
view or edit
|
|
@item
|
|
store in an archive
|
|
@item
|
|
remove or rename
|
|
@item
|
|
change access permissions
|
|
@item
|
|
classify into groups
|
|
@end itemize
|
|
|
|
This manual describes how to perform each of those tasks, and more.
|
|
|
|
@node Overview
|
|
@section Overview
|
|
|
|
The principal programs used for making lists of files that match given
|
|
criteria and running commands on them are @code{find}, @code{locate},
|
|
and @code{xargs}. An additional command, @code{updatedb}, is used by
|
|
system administrators to create databases for @code{locate} to use.
|
|
|
|
@code{find} searches for files in a directory hierarchy and prints
|
|
information about the files it found. It is run like this:
|
|
|
|
@example
|
|
find @r{[}@var{file}@dots{}@r{]} @r{[}@var{expression}@r{]}
|
|
@end example
|
|
|
|
@noindent
|
|
Here is a typical use of @code{find}. This example prints the names
|
|
of all files in the directory tree rooted in @file{/usr/src} whose
|
|
name ends with @samp{.c} and that are larger than 100 Kilobytes.
|
|
@example
|
|
find /usr/src -name '*.c' -size +100k -print
|
|
@end example
|
|
|
|
Notice that the wildcard must be enclosed in quotes in order to
|
|
protect it from expansion by the shell.
|
|
|
|
@code{locate} searches special file name databases for file names that
|
|
match patterns. The system administrator runs the @code{updatedb}
|
|
program to create the databases. @code{locate} is run like this:
|
|
|
|
@example
|
|
locate @r{[}@var{option}@dots{}@r{]} @var{pattern}@dots{}
|
|
@end example
|
|
|
|
@noindent
|
|
This example prints the names of all files in the default file name
|
|
database whose name ends with @samp{Makefile} or @samp{makefile}.
|
|
Which file names are stored in the database depends on how the system
|
|
administrator ran @code{updatedb}.
|
|
@example
|
|
locate '*[Mm]akefile'
|
|
@end example
|
|
|
|
The name @code{xargs}, pronounced EX-args, means ``combine
|
|
arguments.'' @code{xargs} builds and executes command lines by
|
|
gathering together arguments it reads on the standard input. Most
|
|
often, these arguments are lists of file names generated by
|
|
@code{find}. @code{xargs} is run like this:
|
|
|
|
@example
|
|
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
|
@end example
|
|
|
|
@noindent
|
|
The following command searches the files listed in the file
|
|
@file{file-list} and prints all of the lines in them that contain the
|
|
word @samp{typedef}.
|
|
@example
|
|
xargs grep typedef < file-list
|
|
@end example
|
|
|
|
@node find Expressions
|
|
@section @code{find} Expressions
|
|
|
|
The expression that @code{find} uses to select files consists of one
|
|
or more @dfn{primaries}, each of which is a separate command line
|
|
argument to @code{find}. @code{find} evaluates the expression each
|
|
time it processes a file. An expression can contain any of the
|
|
following types of primaries:
|
|
|
|
@table @dfn
|
|
@item options
|
|
affect overall operation rather than the processing of a specific
|
|
file;
|
|
@item tests
|
|
return a true or false value, depending on the file's attributes;
|
|
@item actions
|
|
have side effects and return a true or false value; and
|
|
@item operators
|
|
connect the other arguments and affect when and whether they are
|
|
evaluated.
|
|
@end table
|
|
|
|
You can omit the operator between two primaries; it defaults to
|
|
@samp{-and}. @xref{Combining Primaries With Operators}, for ways to
|
|
connect primaries into more complex expressions. If the expression
|
|
contains no actions other than @samp{-prune}, @samp{-print} is
|
|
performed on all files for which the entire expression is true
|
|
(@pxref{Print File Name}).
|
|
|
|
Options take effect immediately, rather than being evaluated for each
|
|
file when their place in the expression is reached. Therefore, for
|
|
clarity, it is best to place them at the beginning of the expression.
|
|
There are two exceptions to this; @samp{-daystart} and @samp{-follow}
|
|
have different effects depending on where in the command line they
|
|
appear. This can be confusing, so it's best to keep them at the
|
|
beginning, too.
|
|
|
|
Many of the primaries take arguments, which immediately follow them in
|
|
the next command line argument to @code{find}. Some arguments are
|
|
file names, patterns, or other strings; others are numbers. Numeric
|
|
arguments can be specified as
|
|
|
|
@table @code
|
|
@item +@var{n}
|
|
for greater than @var{n},
|
|
@item -@var{n}
|
|
for less than @var{n},
|
|
@item @var{n}
|
|
for exactly @var{n}.
|
|
@end table
|
|
|
|
@node Finding Files
|
|
@chapter Finding Files
|
|
|
|
By default, @code{find} prints to the standard output the names of the
|
|
files that match the given criteria. @xref{Actions}, for how to get
|
|
more information about the matching files.
|
|
|
|
|
|
@menu
|
|
* Name::
|
|
* Links::
|
|
* Time::
|
|
* Size::
|
|
* Type::
|
|
* Owner::
|
|
* Mode Bits::
|
|
* Contents::
|
|
* Directories::
|
|
* Filesystems::
|
|
* Combining Primaries With Operators::
|
|
@end menu
|
|
|
|
@node Name
|
|
@section Name
|
|
|
|
Here are ways to search for files whose name matches a certain
|
|
pattern. @xref{Shell Pattern Matching}, for a description of the
|
|
@var{pattern} arguments to these tests.
|
|
|
|
Each of these tests has a case-sensitive version and a
|
|
case-insensitive version, whose name begins with @samp{i}. In a
|
|
case-insensitive comparison, the patterns @samp{fo*} and @samp{F??}
|
|
match the file names @file{Foo}, @samp{FOO}, @samp{foo}, @samp{fOo},
|
|
etc.
|
|
|
|
@menu
|
|
* Base Name Patterns::
|
|
* Full Name Patterns::
|
|
* Fast Full Name Search::
|
|
* Shell Pattern Matching:: Wildcards used by these programs.
|
|
@end menu
|
|
|
|
@node Base Name Patterns
|
|
@subsection Base Name Patterns
|
|
|
|
@deffn Test -name pattern
|
|
@deffnx Test -iname pattern
|
|
True if the base of the file name (the path with the leading
|
|
directories removed) matches shell pattern @var{pattern}. For
|
|
@samp{-iname}, the match is case-insensitive.@footnote{Because we
|
|
need to perform case-insensitive matching, the GNU fnmatch
|
|
implementation is always used; if the C library includes the GNU
|
|
implementation, we use that and otherwise we use the one from gnulib}
|
|
To ignore a whole directory tree, use @samp{-prune}
|
|
(@pxref{Directories}). As an example, to find Texinfo source files in
|
|
@file{/usr/local/doc}:
|
|
|
|
@example
|
|
find /usr/local/doc -name '*.texi'
|
|
@end example
|
|
|
|
Notice that the wildcard must be enclosed in quotes in order to
|
|
protect it from expansion by the shell.
|
|
|
|
As of findutils version 4.2.2, patterns for @samp{-name} and
|
|
@samp{-iname} will match a file name with a leading @samp{.}. For
|
|
example the command @samp{find /tmp -name \*bar} will match the file
|
|
@file{/tmp/.foobar}. Braces within the pattern (@samp{@{@}}) are not
|
|
considered to be special (that is, @code{find . -name 'foo@{1,2@}'}
|
|
matches a file named @file{foo@{1,2@}}, not the files @file{foo1} and
|
|
@file{foo2}.
|
|
|
|
Because the leading directories are removed, the file names considered
|
|
for a match with @samp{-name} will never include a slash, so
|
|
@samp{-name a/b} will never match anything (you probably need to use
|
|
@samp{-path} instead).
|
|
@end deffn
|
|
|
|
|
|
@node Full Name Patterns
|
|
@subsection Full Name Patterns
|
|
|
|
@deffn Test -path pattern
|
|
@deffnx Test -wholename pattern
|
|
True if the entire file name, starting with the command line argument
|
|
under which the file was found, matches shell pattern @var{pattern}.
|
|
To ignore a whole directory tree, use @samp{-prune} rather than
|
|
checking every file in the tree (@pxref{Directories}). The ``entire
|
|
file name'' as used by @code{find} starts with the starting-point
|
|
specified on the command line, and is not converted to an absolute
|
|
pathname, so for example @code{cd /; find tmp -wholename /tmp} will
|
|
never match anything.
|
|
|
|
Find compares the @samp{-path} argument with the concatenation of a
|
|
directory name and the base name of the file it's considering.
|
|
Since the concatenation will never end with a slash, @samp{-path}
|
|
arguments ending in @samp{/} will match nothing (except perhaps a
|
|
start point specified on the command line).
|
|
|
|
The name @samp{-wholename} is GNU-specific, but @samp{-path} is more
|
|
portable; it is supported by HP-UX @code{find} and is part of the
|
|
POSIX 2008 standard.
|
|
|
|
@end deffn
|
|
|
|
@deffn Test -ipath pattern
|
|
@deffnx Test -iwholename pattern
|
|
These tests are like @samp{-wholename} and @samp{-path}, but the match
|
|
is case-insensitive.
|
|
@end deffn
|
|
|
|
|
|
In the context of the tests @samp{-path}, @samp{-wholename},
|
|
@samp{-ipath} and @samp{-wholename}, a ``full path'' is the name of
|
|
all the directories traversed from @code{find}'s start point to the
|
|
file being tested, followed by the base name of the file itself.
|
|
These paths are often not absolute paths; for example
|
|
|
|
@example
|
|
$ cd /tmp
|
|
$ mkdir -p foo/bar/baz
|
|
$ find foo -path foo/bar -print
|
|
foo/bar
|
|
$ find foo -path /tmp/foo/bar -print
|
|
$ find /tmp/foo -path /tmp/foo/bar -print
|
|
/tmp/foo/bar
|
|
@end example
|
|
|
|
Notice that the second @code{find} command prints nothing, even though
|
|
@file{/tmp/foo/bar} exists and was examined by @code{find}.
|
|
|
|
Unlike file name expansion on the command line, a @samp{*} in the pattern
|
|
will match both @samp{/} and leading dots in file names:
|
|
|
|
@example
|
|
$ find . -path '*f'
|
|
./quux/bar/baz/f
|
|
$ find . -path '*/*config'
|
|
./quux/bar/baz/.config
|
|
@end example
|
|
|
|
|
|
@deffn Test -regex expr
|
|
@deffnx Test -iregex expr
|
|
True if the entire file name matches regular expression @var{expr}.
|
|
This is a match on the whole path, not a search. For example, to
|
|
match a file named @file{./fubar3}, you can use the regular expression
|
|
@samp{.*bar.} or @samp{.*b.*3}, but not @samp{f.*r3}. @xref{Regexps,
|
|
, Syntax of Regular Expressions, emacs, The GNU Emacs Manual}, for a
|
|
description of the syntax of regular expressions. For @samp{-iregex},
|
|
the match is case-insensitive.
|
|
|
|
As for @samp{-path}, the candidate file name never ends with a slash,
|
|
so regular expressions which only match something that ends in slash
|
|
will always fail.
|
|
|
|
There are several varieties of regular expressions; by default this
|
|
test uses POSIX basic regular expressions, but this can be changed
|
|
with the option @samp{-regextype}.
|
|
@end deffn
|
|
|
|
@deffn Option -regextype name
|
|
This option controls the variety of regular expression syntax
|
|
understood by the @samp{-regex} and @samp{-iregex} tests. This option
|
|
is positional; that is, it only affects regular expressions which
|
|
occur later in the command line. If this option is not given, GNU
|
|
Emacs regular expressions are assumed. Currently-implemented types
|
|
are
|
|
|
|
|
|
@table @samp
|
|
@item emacs
|
|
Regular expressions compatible with GNU Emacs; this is also the
|
|
default behaviour if this option is not used.
|
|
@item posix-awk
|
|
Regular expressions compatible with the POSIX awk command (not GNU awk)
|
|
@item posix-basic
|
|
POSIX Basic Regular Expressions.
|
|
@item posix-egrep
|
|
Regular expressions compatible with the POSIX egrep command
|
|
@item posix-extended
|
|
POSIX Extended Regular Expressions
|
|
@end table
|
|
|
|
@ref{Regular Expressions} for more information on the regular
|
|
expression dialects understood by GNU findutils.
|
|
|
|
|
|
@end deffn
|
|
|
|
@node Fast Full Name Search
|
|
@subsection Fast Full Name Search
|
|
|
|
To search for files by name without having to actually scan the
|
|
directories on the disk (which can be slow), you can use the
|
|
@code{locate} program. For each shell pattern you give it,
|
|
@code{locate} searches one or more databases of file names and
|
|
displays the file names that contain the pattern. @xref{Shell Pattern
|
|
Matching}, for details about shell patterns.
|
|
|
|
If a pattern is a plain string -- it contains no
|
|
metacharacters -- @code{locate} displays all file names in the database
|
|
that contain that string. If a pattern contains
|
|
metacharacters, @code{locate} only displays file names that match the
|
|
pattern exactly. As a result, patterns that contain metacharacters
|
|
should usually begin with a @samp{*}, and will most often end with one
|
|
as well. The exceptions are patterns that are intended to explicitly
|
|
match the beginning or end of a file name.
|
|
|
|
If you only want @code{locate} to match against the last component of
|
|
the file names (the ``base name'' of the files) you can use the
|
|
@samp{--basename} option. The opposite behaviour is the default, but
|
|
can be selected explicitly by using the option @samp{--wholename}.
|
|
|
|
The command
|
|
@example
|
|
locate @var{pattern}
|
|
@end example
|
|
|
|
is almost equivalent to
|
|
@example
|
|
find @var{directories} -name @var{pattern}
|
|
@end example
|
|
|
|
where @var{directories} are the directories for which the file name
|
|
databases contain information. The differences are that the
|
|
@code{locate} information might be out of date, and that @code{locate}
|
|
handles wildcards in the pattern slightly differently than @code{find}
|
|
(@pxref{Shell Pattern Matching}).
|
|
|
|
The file name databases contain lists of files that were on the system
|
|
when the databases were last updated. The system administrator can
|
|
choose the file name of the default database, the frequency with which
|
|
the databases are updated, and the directories for which they contain
|
|
entries.
|
|
|
|
Here is how to select which file name databases @code{locate}
|
|
searches. The default is system-dependent. At the time this document
|
|
was generated, the default was @file{@value{LOCATE_DB}}.
|
|
|
|
@table @code
|
|
@item --database=@var{path}
|
|
@itemx -d @var{path}
|
|
Instead of searching the default file name database, search the file
|
|
name databases in @var{path}, which is a colon-separated list of
|
|
database file names. You can also use the environment variable
|
|
@code{LOCATE_PATH} to set the list of database files to search. The
|
|
option overrides the environment variable if both are used.
|
|
@end table
|
|
|
|
GNU @code{locate} can read file name databases generated by the
|
|
@code{slocate} package. However, these generally contain a list of
|
|
all the files on the system, and so when using this database,
|
|
@code{locate} will produce output only for files which are accessible
|
|
to you. @xref{Invoking locate}, for a description of the
|
|
@samp{--existing} option which is used to do this.
|
|
|
|
The @code{updatedb} program can also generate database in a format
|
|
compatible with @code{slocate}. @xref{Invoking updatedb}, for a
|
|
description of its @samp{--dbformat} and @samp{--output} options.
|
|
|
|
|
|
@node Shell Pattern Matching
|
|
@subsection Shell Pattern Matching
|
|
|
|
@code{find} and @code{locate} can compare file names, or parts of file
|
|
names, to shell patterns. A @dfn{shell pattern} is a string that may
|
|
contain the following special characters, which are known as
|
|
@dfn{wildcards} or @dfn{metacharacters}.
|
|
|
|
You must quote patterns that contain metacharacters to prevent the
|
|
shell from expanding them itself. Double and single quotes both work;
|
|
so does escaping with a backslash.
|
|
|
|
@table @code
|
|
@item *
|
|
Matches any zero or more characters.
|
|
|
|
@item ?
|
|
Matches any one character.
|
|
|
|
@item [@var{string}]
|
|
Matches exactly one character that is a member of the string
|
|
@var{string}. This is called a @dfn{character class}. As a
|
|
shorthand, @var{string} may contain ranges, which consist of two
|
|
characters with a dash between them. For example, the class
|
|
@samp{[a-z0-9_]} matches a lowercase letter, a number, or an
|
|
underscore. You can negate a class by placing a @samp{!} or @samp{^}
|
|
immediately after the opening bracket. Thus, @samp{[^A-Z@@]} matches
|
|
any character except an uppercase letter or an at sign.
|
|
|
|
@item \
|
|
Removes the special meaning of the character that follows it. This
|
|
works even in character classes.
|
|
@end table
|
|
|
|
In the @code{find} tests that do shell pattern matching (@samp{-name},
|
|
@samp{-wholename}, etc.), wildcards in the pattern will match a
|
|
@samp{.} at the beginning of a file name. This is also the case for
|
|
@code{locate}. Thus, @samp{find -name '*macs'} will match a file
|
|
named @file{.emacs}, as will @samp{locate '*macs'}.
|
|
|
|
Slash characters have no special significance in the shell pattern
|
|
matching that @code{find} and @code{locate} do, unlike in the shell,
|
|
in which wildcards do not match them. Therefore, a pattern
|
|
@samp{foo*bar} can match a file name @samp{foo3/bar}, and a pattern
|
|
@samp{./sr*sc} can match a file name @samp{./src/misc}.
|
|
|
|
If you want to locate some files with the @samp{locate} command but
|
|
don't need to see the full list you can use the @samp{--limit} option
|
|
to see just a small number of results, or the @samp{--count} option to
|
|
display only the total number of matches.
|
|
|
|
@node Links
|
|
@section Links
|
|
|
|
There are two ways that files can be linked together. @dfn{Symbolic
|
|
links} are a special type of file whose contents are a portion of the
|
|
name of another file. @dfn{Hard links} are multiple directory entries
|
|
for one file; the file names all have the same index node
|
|
(@dfn{inode}) number on the disk.
|
|
|
|
@menu
|
|
* Symbolic Links::
|
|
* Hard Links::
|
|
@end menu
|
|
|
|
@node Symbolic Links
|
|
@subsection Symbolic Links
|
|
|
|
Symbolic links are names that reference other files. GNU @code{find}
|
|
will handle symbolic links in one of two ways; firstly, it can
|
|
dereference the links for you - this means that if it comes across a
|
|
symbolic link, it examines the file that the link points to, in order
|
|
to see if it matches the criteria you have specified. Secondly, it
|
|
can check the link itself in case you might be looking for the actual
|
|
link. If the file that the symbolic link points to is also within the
|
|
directory hierarchy you are searching with the @code{find} command,
|
|
you may not see a great deal of difference between these two
|
|
alternatives.
|
|
|
|
By default, @code{find} examines symbolic links themselves when it
|
|
finds them (and, if it later comes across the linked-to file, it will
|
|
examine that, too). If you would prefer @code{find} to dereference
|
|
the links and examine the file that each link points to, specify the
|
|
@samp{-L} option to @code{find}. You can explicitly specify the
|
|
default behaviour by using the @samp{-P} option. The @samp{-H}
|
|
option is a half-way-between option which ensures that any symbolic
|
|
links listed on the command line are dereferenced, but other symbolic
|
|
links are not.
|
|
|
|
Symbolic links are different from ``hard links'' in the sense that you
|
|
need permission to search the directories
|
|
in the linked-to file name to
|
|
dereference the link. This can mean that even if you specify the
|
|
@samp{-L} option, @code{find} may not be able to determine the
|
|
properties of the file that the link points to (because you don't have
|
|
sufficient permission). In this situation, @code{find} uses the
|
|
properties of the link itself. This also occurs if a symbolic link
|
|
exists but points to a file that is missing.
|
|
|
|
The options controlling the behaviour of @code{find} with respect to
|
|
links are as follows:
|
|
|
|
@table @samp
|
|
@item -P
|
|
@code{find} does not dereference symbolic links at all. This is the
|
|
default behaviour. This option must be specified before any of the
|
|
file names on the command line.
|
|
@item -H
|
|
@code{find} does not dereference symbolic links (except in the case of
|
|
file names on the command line, which are dereferenced). If a
|
|
symbolic link cannot be dereferenced, the information for the symbolic
|
|
link itself is used. This option must be specified before any of the
|
|
file names on the command line.
|
|
@item -L
|
|
@code{find} dereferences symbolic links where possible, and where this
|
|
is not possible it uses the properties of the symbolic link itself.
|
|
This option must be specified before any of the file names on the
|
|
command line. Use of this option also implies the same behaviour as
|
|
the @samp{-noleaf} option. If you later use the @samp{-H} or
|
|
@samp{-P} options, this does not turn off @samp{-noleaf}.
|
|
|
|
Actions that can cause symbolic links to become broken while
|
|
@samp{find} is executing (for example @samp{-delete}) can give rise to
|
|
confusing behaviour. Take for example the command line
|
|
@samp{find -L . -type d -delete}. This will delete empty
|
|
directories. If a subtree includes only directories and symbolic
|
|
links to directoires, this command may still not successfully delete
|
|
it, since deletion of the target of the symbolic link will cause the
|
|
symbolic link to become broken and @samp{-type d} is false for broken
|
|
symbolic links.
|
|
|
|
@item -follow
|
|
This option forms part of the ``expression'' and must be specified
|
|
after the file names, but it is otherwise equivalent to @samp{-L}.
|
|
The @samp{-follow} option affects only those tests which appear after
|
|
it on the command line. This option is deprecated. Where possible,
|
|
you should use @samp{-L} instead.
|
|
@end table
|
|
|
|
The following differences in behaviour occur when the @samp{-L} option
|
|
is used:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@code{find} follows symbolic links to directories when searching
|
|
directory trees.
|
|
@item
|
|
@samp{-lname} and @samp{-ilname} always return false (unless they
|
|
happen to match broken symbolic links).
|
|
@item
|
|
@samp{-type} reports the types of the files that symbolic links point
|
|
to. This means that in combination with @samp{-L}, @samp{-type l}
|
|
will be true only for broken symbolic links. To check for symbolic
|
|
links when @samp{-L} has been specified, use @samp{-xtype l}.
|
|
@item
|
|
Implies @samp{-noleaf} (@pxref{Directories}).
|
|
@end itemize
|
|
|
|
If the @samp{-L} option or the @samp{-H} option is used,
|
|
the file names used as arguments to @samp{-newer}, @samp{-anewer}, and
|
|
@samp{-cnewer} are dereferenced and the timestamp from the pointed-to
|
|
file is used instead (if possible -- otherwise the timestamp from the
|
|
symbolic link is used).
|
|
|
|
@deffn Test -lname pattern
|
|
@deffnx Test -ilname pattern
|
|
True if the file is a symbolic link whose contents match shell pattern
|
|
@var{pattern}. For @samp{-ilname}, the match is case-insensitive.
|
|
@xref{Shell Pattern Matching}, for details about the @var{pattern}
|
|
argument. If the @samp{-L} option is in effect, this test will always
|
|
return false for symbolic links unless they are broken. So, to list
|
|
any symbolic links to @file{sysdep.c} in the current directory and its
|
|
subdirectories, you can do:
|
|
|
|
@example
|
|
find . -lname '*sysdep.c'
|
|
@end example
|
|
@end deffn
|
|
|
|
@node Hard Links
|
|
@subsection Hard Links
|
|
|
|
Hard links allow more than one name to refer to the same file. To
|
|
find all the names which refer to the same file as @var{name}, use
|
|
@samp{-samefile NAME}. If you are not using the @samp{-L} option, you
|
|
can confine your search to one filesystem using the @samp{-xdev}
|
|
option. This is useful because hard links cannot point outside a
|
|
single filesystem, so this can cut down on needless searching.
|
|
|
|
If the @samp{-L} option is in effect, and @var{name} is in fact a symbolic
|
|
link, the symbolic link will be dereferenced. Hence you are searching
|
|
for other links (hard or symbolic) to the file pointed to by @var{name}. If
|
|
@samp{-L} is in effect but @var{name} is not itself a symbolic link, other
|
|
symbolic links to the file @var{name} will be matched.
|
|
|
|
You can also search for files by inode number. This can occasionally
|
|
be useful in diagnosing problems with filesystems for example, because
|
|
@code{fsck} tends to print inode numbers. Inode numbers also
|
|
occasionally turn up in log messages for some types of software, and
|
|
are used to support the @code{ftok()} library function.
|
|
|
|
You can learn a file's inode number and the number of links to it by
|
|
running @samp{ls -li} or @samp{find -ls}.
|
|
|
|
You can search for hard links to inode number NUM by using @samp{-inum
|
|
NUM}. If there are any filesystem mount points below the directory
|
|
where you are starting the search, use the @samp{-xdev} option unless
|
|
you are also using the @samp{-L} option. Using @samp{-xdev} this
|
|
saves needless searching, since hard links to a file must be on the
|
|
same filesystem. @xref{Filesystems}.
|
|
|
|
@deffn Test -samefile NAME
|
|
File is a hard link to the same inode as @var{name}. If the @samp{-L}
|
|
option is in effect, symbolic links to the same file as @var{name} points to
|
|
are also matched.
|
|
@end deffn
|
|
|
|
@deffn Test -inum n
|
|
File has inode number @var{n}. The @samp{+} and @samp{-} qualifiers
|
|
also work, though these are rarely useful. Much of the time it is
|
|
easier to use @samp{-samefile} rather than this option.
|
|
@end deffn
|
|
|
|
You can also search for files that have a certain number of links,
|
|
with @samp{-links}. Directories normally have at least two hard
|
|
links; their @file{.} entry is the second one. If they have
|
|
subdirectories, each of those also has a hard link called @file{..} to
|
|
its parent directory. The @file{.} and @file{..} directory entries
|
|
are not normally searched unless they are mentioned on the @code{find}
|
|
command line.
|
|
|
|
@deffn Test -links n
|
|
File has @var{n} hard links.
|
|
@end deffn
|
|
|
|
@deffn Test -links +n
|
|
File has more than @var{n} hard links.
|
|
@end deffn
|
|
|
|
@deffn Test -links -n
|
|
File has fewer than @var{n} hard links.
|
|
@end deffn
|
|
|
|
@node Time
|
|
@section Time
|
|
|
|
Each file has three time stamps, which record the last time that
|
|
certain operations were performed on the file:
|
|
|
|
@enumerate
|
|
@item
|
|
access (read the file's contents)
|
|
@item
|
|
change the status (modify the file or its attributes)
|
|
@item
|
|
modify (change the file's contents)
|
|
@end enumerate
|
|
|
|
Some systems also provide a timestamp that indicates when a file was
|
|
@emph{created}. For example, the UFS2 filesystem under NetBSD-3.1
|
|
records the @emph{birth time} of each file. This information is also
|
|
available under other versions of BSD and some versions of Cygwin.
|
|
However, even on systems which support file birth time, files may
|
|
exist for which this information was not recorded (for example, UFS1
|
|
file systems simply do not contain this information).
|
|
|
|
You can search for files whose time stamps are within a certain age
|
|
range, or compare them to other time stamps.
|
|
|
|
@menu
|
|
* Age Ranges::
|
|
* Comparing Timestamps::
|
|
@end menu
|
|
|
|
@node Age Ranges
|
|
@subsection Age Ranges
|
|
|
|
These tests are mainly useful with ranges (@samp{+@var{n}} and
|
|
@samp{-@var{n}}).
|
|
|
|
@deffn Test -atime n
|
|
@deffnx Test -ctime n
|
|
@deffnx Test -mtime n
|
|
True if the file was last accessed (or its status changed, or it was
|
|
modified) @var{n}*24 hours ago. The number of 24-hour periods since
|
|
the file's timestamp is always rounded down; therefore 0 means ``less
|
|
than 24 hours ago'', 1 means ``between 24 and 48 hours ago'', and so
|
|
forth. Fractional values are supported but this only really makes
|
|
sense for the case where ranges (@samp{+@var{n}} and @samp{-@var{n}})
|
|
are used.
|
|
@end deffn
|
|
|
|
@deffn Test -amin n
|
|
@deffnx Test -cmin n
|
|
@deffnx Test -mmin n
|
|
True if the file was last accessed (or its status changed, or it was
|
|
modified) @var{n} minutes ago. These tests provide finer granularity
|
|
of measurement than @samp{-atime} et al., but rounding is done in a
|
|
similar way (again, fractions are supported). For example, to list
|
|
files in @file{/u/bill} that were last read from 2 to 6 minutes ago:
|
|
|
|
@example
|
|
find /u/bill -amin +2 -amin -6
|
|
@end example
|
|
@end deffn
|
|
|
|
@deffn Option -daystart
|
|
Measure times from the beginning of today rather than from 24 hours
|
|
ago. So, to list the regular files in your home directory that were
|
|
modified yesterday, do
|
|
|
|
@example
|
|
find ~/ -daystart -type f -mtime 1
|
|
@end example
|
|
|
|
The @samp{-daystart} option is unlike most other options in that it
|
|
has an effect on the way that other tests are performed. The affected
|
|
tests are @samp{-amin}, @samp{-cmin}, @samp{-mmin}, @samp{-atime},
|
|
@samp{-ctime} and @samp{-mtime}. The @samp{-daystart} option only
|
|
affects the behaviour of any tests which appear after it on the
|
|
command line.
|
|
@end deffn
|
|
|
|
@node Comparing Timestamps
|
|
@subsection Comparing Timestamps
|
|
|
|
@deffn Test -newerXY reference
|
|
Succeeds if timestamp @samp{X} of the file being considered is newer
|
|
than timestamp @samp{Y} of the file @file{reference}. The letters
|
|
@samp{X} and @samp{Y} can be any of the following letters:
|
|
|
|
@table @samp
|
|
@item a
|
|
Last-access time of @file{reference}
|
|
@item B
|
|
Birth time of @file{reference} (when this is not known, the test cannot succeed)
|
|
@item c
|
|
Last-change time of @file{reference}
|
|
@item m
|
|
Last-modification time of @file{reference}
|
|
@item t
|
|
The @file{reference} argument is interpreted as a literal time, rather
|
|
than the name of a file. @xref{Date input formats}, for a description
|
|
of how the timestamp is understood. Tests of the form @samp{-newerXt}
|
|
are valid but tests of the form @samp{-newertY} are not.
|
|
@end table
|
|
|
|
For example the test @code{-newerac /tmp/foo} succeeds for all files
|
|
which have been accessed more recently than @file{/tmp/foo} was
|
|
changed. Here @samp{X} is @samp{a} and @samp{Y} is @samp{c}.
|
|
|
|
Not all files have a known birth time. If @samp{Y} is @samp{b} and
|
|
the birth time of @file{reference} is not available, @code{find} exits
|
|
with an explanatory error message. If @samp{X} is @samp{b} and we do
|
|
not know the birth time the file currently being considered, the test
|
|
simply fails (that is, it behaves like @code{-false} does).
|
|
|
|
Some operating systems (for example, most implementations of Unix) do
|
|
not support file birth times. Some others, for example NetBSD-3.1,
|
|
do. Even on operating systems which support file birth times, the
|
|
information may not be available for specific files. For example,
|
|
under NetBSD, file birth times are supported on UFS2 file systems, but
|
|
not UFS1 file systems.
|
|
|
|
@end deffn
|
|
|
|
|
|
|
|
There are two ways to list files in @file{/usr} modified after
|
|
February 1 of the current year. One uses @samp{-newermt}:
|
|
|
|
@example
|
|
find /usr -newermt "Feb 1"
|
|
@end example
|
|
|
|
The other way of doing this works on the versions of find before 4.3.3:
|
|
|
|
@c Idea from Rick Sladkey.
|
|
@example
|
|
touch -t 02010000 /tmp/stamp$$
|
|
find /usr -newer /tmp/stamp$$
|
|
rm -f /tmp/stamp$$
|
|
@end example
|
|
|
|
@deffn Test -anewer file
|
|
@deffnx Test -cnewer file
|
|
@deffnx Test -newer file
|
|
True if the file was last accessed (or its status changed, or it was
|
|
modified) more recently than @var{file} was modified. These tests are
|
|
affected by @samp{-follow} only if @samp{-follow} comes before them on
|
|
the command line. @xref{Symbolic Links}, for more information on
|
|
@samp{-follow}. As an example, to list any files modified since
|
|
@file{/bin/sh} was last modified:
|
|
|
|
@example
|
|
find . -newer /bin/sh
|
|
@end example
|
|
@end deffn
|
|
|
|
@deffn Test -used n
|
|
True if the file was last accessed @var{n} days after its status was
|
|
last changed. Useful for finding files that are not being used, and
|
|
could perhaps be archived or removed to save disk space.
|
|
@end deffn
|
|
|
|
@node Size
|
|
@section Size
|
|
|
|
@deffn Test -size n@r{[}bckwMG@r{]}
|
|
True if the file uses @var{n} units of space, rounding up. The units
|
|
are 512-byte blocks by default, but they can be changed by adding a
|
|
one-character suffix to @var{n}:
|
|
|
|
@table @code
|
|
@item b
|
|
512-byte blocks (never 1024)
|
|
@item c
|
|
bytes
|
|
@item k
|
|
kilobytes (1024 bytes)
|
|
@item w
|
|
2-byte words
|
|
@item M
|
|
Megabytes (units of 1048576 bytes)
|
|
@item G
|
|
Gigabytes (units of 1073741824 bytes)
|
|
@end table
|
|
|
|
The `b' suffix always considers blocks to be 512 bytes. This is not
|
|
affected by the setting (or non-setting) of the @code{POSIXLY_CORRECT}
|
|
environment variable. This behaviour is different from the behaviour of
|
|
the @samp{-ls} action). If you want to use 1024-byte units, use the
|
|
`k' suffix instead.
|
|
|
|
The number can be prefixed with a `+' or a `-'. A plus sign indicates
|
|
that the test should succeed if the file uses at least @var{n} units
|
|
of storage (a common use of this test) and a minus sign
|
|
indicates that the test should succeed if the file uses less than
|
|
@var{n} units of storage. There is no `=' prefix, because that's the
|
|
default anyway.
|
|
|
|
The size does not count indirect blocks, but it does count blocks in
|
|
sparse files that are not actually allocated. In other words, it's
|
|
consistent with the result you get for @samp{ls -l} or @samp{wc -c}.
|
|
This handling of sparse files differs from the output of the @samp{%k}
|
|
and @samp{%b} format specifiers for the @samp{-printf} predicate.
|
|
|
|
@end deffn
|
|
|
|
@deffn Test -empty
|
|
True if the file is empty and is either a regular file or a directory.
|
|
This might help determine good candidates for deletion. This test is
|
|
useful with @samp{-depth} (@pxref{Directories}) and @samp{-delete}
|
|
(@pxref{Single File}).
|
|
@end deffn
|
|
|
|
@node Type
|
|
@section Type
|
|
|
|
@deffn Test -type c
|
|
True if the file is of type @var{c}:
|
|
|
|
@table @code
|
|
@item b
|
|
block (buffered) special
|
|
@item c
|
|
character (unbuffered) special
|
|
@item d
|
|
directory
|
|
@item p
|
|
named pipe (FIFO)
|
|
@item f
|
|
regular file
|
|
@item l
|
|
symbolic link; if @samp{-L} is in effect, this is true only for broken
|
|
symbolic links. If you want to search for symbolic links when
|
|
@samp{-L} is in effect, use @samp{-xtype} instead of @samp{-type}.
|
|
@item s
|
|
socket
|
|
@item D
|
|
door (Solaris)
|
|
@end table
|
|
@end deffn
|
|
|
|
@deffn Test -xtype c
|
|
This test behaves the same as @samp{-type} unless the file is a
|
|
symbolic link. If the file is a symbolic link, the result is as
|
|
follows (in the table below, @samp{X} should be understood to
|
|
represent any letter except @samp{l}):
|
|
|
|
@table @samp
|
|
@item @samp{-P -xtype l}
|
|
True if the symbolic link is broken
|
|
@item @samp{-P -xtype X}
|
|
True if the (ultimate) target file is of type @samp{X}.
|
|
@item @samp{-L -xtype l}
|
|
Always true
|
|
@item @samp{-L -xtype X}
|
|
False unless the symbolic link is broken
|
|
@end table
|
|
|
|
In other words, for symbolic links, @samp{-xtype} checks the type of
|
|
the file that @samp{-type} does not check.
|
|
|
|
The @samp{-H} option also affects the behaviour of @samp{-xtype}.
|
|
When @samp{-H} is in effect, @samp{-xtype} behaves as if @samp{-L} had
|
|
been specified when examining files listed on the command line, and as
|
|
if @samp{-P} had been specified otherwise. If neither @samp{-H} nor
|
|
@samp{-L} was specified, @samp{-xtype} behaves as if @samp{-P} had
|
|
been specified.
|
|
|
|
@xref{Symbolic Links}, for more information on @samp{-follow} and
|
|
@samp{-L}.
|
|
@end deffn
|
|
|
|
@node Owner
|
|
@section Owner
|
|
|
|
@deffn Test -user uname
|
|
@deffnx Test -group gname
|
|
True if the file is owned by user @var{uname} (belongs to group
|
|
@var{gname}). A numeric ID is allowed.
|
|
@end deffn
|
|
|
|
@deffn Test -uid n
|
|
@deffnx Test -gid n
|
|
True if the file's numeric user ID (group ID) is @var{n}. These tests
|
|
support ranges (@samp{+@var{n}} and @samp{-@var{n}}), unlike
|
|
@samp{-user} and @samp{-group}.
|
|
@end deffn
|
|
|
|
@deffn Test -nouser
|
|
@deffnx Test -nogroup
|
|
True if no user corresponds to the file's numeric user ID (no group
|
|
corresponds to the numeric group ID). These cases usually mean that
|
|
the files belonged to users who have since been removed from the
|
|
system. You probably should change the ownership of such files to an
|
|
existing user or group, using the @code{chown} or @code{chgrp}
|
|
program.
|
|
@end deffn
|
|
|
|
@node Mode Bits
|
|
@section File Mode Bits
|
|
|
|
@xref{File Permissions}, for information on how file mode bits are
|
|
structured and how to specify them.
|
|
|
|
Four tests determine what users can do with files. These are
|
|
@samp{-readable}, @samp{-writable}, @samp{-executable} and
|
|
@samp{-perm}. The first three tests ask the operating system if the
|
|
current user can perform the relevant operation on a file, while
|
|
@samp{-perm} just examines the file's mode. The file mode may give
|
|
a misleading impression of what the user can actually do, because the
|
|
file may have an access control list, or exist on a read-only
|
|
filesystem, for example. Of these four tests though, only
|
|
@samp{-perm} is specified by the POSIX standard.
|
|
|
|
The @samp{-readable}, @samp{-writable} and @samp{-executable} tests
|
|
are implemented via the @code{access} system call. This is
|
|
implemented within the operating system itself. If the file being
|
|
considered is on an NFS filesystem, the remote system may allow or
|
|
forbid read or write operations for reasons of which the NFS client
|
|
cannot take account. This includes user-ID mapping, either in the
|
|
general sense or the more restricted sense in which remote superusers
|
|
are treated by the NFS server as if they are the local user
|
|
@samp{nobody} on the NFS server.
|
|
|
|
None of the tests in this section should be used to verify that a user
|
|
is authorised to perform any operation (on the file being tested or
|
|
any other file) because of the possibility of a race condition. That
|
|
is, the situation may change between the test and an action being
|
|
taken on the basis of the result of that test.
|
|
|
|
|
|
@deffn Test -readable
|
|
True if the file can be read by the invoking user.
|
|
@end deffn
|
|
|
|
@deffn Test -writable
|
|
True if the file can be written by the invoking user. This is an
|
|
in-principle check, and other things may prevent a successful write
|
|
operation; for example, the filesystem might be full.
|
|
@end deffn
|
|
|
|
@deffn Test -executable
|
|
True if the file can be executed/searched by the invoking user.
|
|
@end deffn
|
|
|
|
@deffn Test -perm pmode
|
|
|
|
True if the file's mode bits match @var{pmode}, which can be
|
|
either a symbolic or numeric @var{mode} (@pxref{File Permissions})
|
|
optionally prefixed by @samp{-} or @samp{/}.
|
|
|
|
A @var{pmode} that starts with neither @samp{-} nor @samp{/} matches
|
|
if @var{mode} exactly matches the file mode bits.
|
|
(To avoid confusion with an obsolete GNU extension, @var{mode}
|
|
must not start with a @samp{+} immediately followed by an octal digit.)
|
|
|
|
A @var{pmode} that starts with @samp{-} matches if
|
|
@emph{all} the file mode bits set in @var{mode} are set for the file;
|
|
bits not set in @var{mode} are ignored.
|
|
|
|
A @var{pmode} that starts with @samp{/} matches if
|
|
@emph{any} of the file mode bits set in @var{mode} are set for the file;
|
|
bits not set in @var{mode} are ignored.
|
|
This is a GNU extension.
|
|
|
|
If you don't use the @samp{/} or @samp{-} form with a symbolic mode
|
|
string, you may have to specify a rather complex mode string. For
|
|
example @samp{-perm g=w} will only match files that have mode 0020
|
|
(that is, ones for which group write permission is the only file mode bit
|
|
set). It is more likely that you will want to use the @samp{/} or
|
|
@samp{-} forms, for example @samp{-perm -g=w}, which matches any file
|
|
with group write permission.
|
|
|
|
|
|
@table @samp
|
|
@item -perm 664
|
|
Match files that have read and write permission for their owner,
|
|
and group, but that the rest of the world can read but not write to.
|
|
Do not match files that meet these criteria but have other file mode
|
|
bits set (for example if someone can execute/search the file).
|
|
|
|
@item -perm -664
|
|
Match files that have read and write permission for their owner,
|
|
and group, but that the rest of the world can read but not write to,
|
|
without regard to the presence of any extra file mode bits (for
|
|
example the executable bit). This matches a file with mode
|
|
0777, for example.
|
|
|
|
@item -perm /222
|
|
Match files that are writable by somebody (their owner, or
|
|
their group, or anybody else).
|
|
|
|
@item -perm /022
|
|
Match files that are writable by either their owner or their
|
|
group. The files don't have to be writable by both the owner and
|
|
group to be matched; either will do.
|
|
|
|
@item -perm /g+w,o+w
|
|
As above.
|
|
|
|
@item -perm /g=w,o=w
|
|
As above.
|
|
|
|
@item -perm -022
|
|
Match files that are writable by both their owner and their
|
|
group.
|
|
|
|
@item -perm -444 -perm /222 ! -perm /111
|
|
Match files that are readable for everybody, have at least one
|
|
write bit set (i.e., somebody can write to them), but that cannot be
|
|
executed/searched by anybody. Note that in some shells the @samp{!} must be
|
|
escaped;.
|
|
|
|
@item -perm -a+r -perm /a+w ! -perm /a+x
|
|
As above.
|
|
|
|
|
|
@item -perm -g+w,o+w
|
|
As above.
|
|
@end table
|
|
|
|
@quotation Warning
|
|
If you specify @samp{-perm /000} or @samp{-perm /mode} where the
|
|
symbolic mode @samp{mode} has no bits set, the test matches all files.
|
|
Versions of GNU @code{find} prior to 4.3.3 matched no files in this
|
|
situation.
|
|
@end quotation
|
|
|
|
@end deffn
|
|
|
|
@deffn Test -context pattern
|
|
True if file's SELinux context matches the pattern @var{pattern}.
|
|
The pattern uses shell glob matching.
|
|
|
|
This predicate is supported only on @code{find} versions compiled with
|
|
SELinux support and only when SELinux is enabled.
|
|
@end deffn
|
|
|
|
@node Contents
|
|
@section Contents
|
|
|
|
To search for files based on their contents, you can use the
|
|
@code{grep} program. For example, to find out which C source files in
|
|
the current directory contain the string @samp{thing}, you can do:
|
|
|
|
@example
|
|
grep -l thing *.[ch]
|
|
@end example
|
|
|
|
If you also want to search for the string in files in subdirectories,
|
|
you can combine @code{grep} with @code{find} and @code{xargs}, like
|
|
this:
|
|
|
|
@example
|
|
find . -name '*.[ch]' | xargs grep -l thing
|
|
@end example
|
|
|
|
The @samp{-l} option causes @code{grep} to print only the names of
|
|
files that contain the string, rather than the lines that contain it.
|
|
The string argument (@samp{thing}) is actually a regular expression,
|
|
so it can contain metacharacters. This method can be refined a little
|
|
by using the @samp{-r} option to make @code{xargs} not run @code{grep}
|
|
if @code{find} produces no output, and using the @code{find} action
|
|
@samp{-print0} and the @code{xargs} option @samp{-0} to avoid
|
|
misinterpreting files whose names contain spaces:
|
|
|
|
@example
|
|
find . -name '*.[ch]' -print0 | xargs -r -0 grep -l thing
|
|
@end example
|
|
|
|
For a fuller treatment of finding files whose contents match a
|
|
pattern, see the manual page for @code{grep}.
|
|
|
|
@node Directories
|
|
@section Directories
|
|
|
|
Here is how to control which directories @code{find} searches, and how
|
|
it searches them. These two options allow you to process a horizontal
|
|
slice of a directory tree.
|
|
|
|
@deffn Option -maxdepth levels
|
|
Descend at most @var{levels} (a non-negative integer) levels of
|
|
directories below the command line arguments. @samp{-maxdepth 0}
|
|
means only apply the tests and actions to the command line arguments.
|
|
@end deffn
|
|
|
|
@deffn Option -mindepth levels
|
|
Do not apply any tests or actions at levels less than @var{levels} (a
|
|
non-negative integer). @samp{-mindepth 1} means process all files
|
|
except the command line arguments.
|
|
@end deffn
|
|
|
|
@deffn Option -depth
|
|
Process each directory's contents before the directory itself. Doing
|
|
this is a good idea when producing lists of files to archive with
|
|
@code{cpio} or @code{tar}. If a directory does not have write
|
|
permission for its owner, its contents can still be restored from the
|
|
archive since the directory's permissions are restored after its
|
|
contents.
|
|
@end deffn
|
|
|
|
@deffn Option -d
|
|
This is a deprecated synonym for @samp{-depth}, for compatibility with
|
|
Mac OS X, FreeBSD and OpenBSD. The @samp{-depth} option is a POSIX
|
|
feature, so it is better to use that.
|
|
@end deffn
|
|
|
|
@deffn Action -prune
|
|
If the file is a directory, do not descend into it. The result is
|
|
true. For example, to skip the directory @file{src/emacs} and all
|
|
files and directories under it, and print the names of the other files
|
|
found:
|
|
|
|
@example
|
|
find . -wholename './src/emacs' -prune -o -print
|
|
@end example
|
|
|
|
The above command will not print @file{./src/emacs} among its list of
|
|
results. This however is not due to the effect of the @samp{-prune}
|
|
action (which only prevents further descent, it doesn't make sure we
|
|
ignore that item). Instead, this effect is due to the use of
|
|
@samp{-o}. Since the left hand side of the ``or'' condition has
|
|
succeeded for @file{./src/emacs}, it is not necessary to evaluate the
|
|
right-hand-side (@samp{-print}) at all for this particular file. If
|
|
you wanted to print that directory name you could use either an extra
|
|
@samp{-print} action:
|
|
|
|
@example
|
|
find . -wholename './src/emacs' -prune -print -o -print
|
|
@end example
|
|
|
|
or use the comma operator:
|
|
|
|
@example
|
|
find . -wholename './src/emacs' -prune , -print
|
|
@end example
|
|
|
|
If the @samp{-depth} option is in effect, the subdirectories will have
|
|
already been visited in any case. Hence @samp{-prune} has no effect
|
|
in this case.
|
|
|
|
Because @samp{-delete} implies @samp{-depth}, using @samp{-prune} in
|
|
combination with @samp{-delete} may well result in the deletion of
|
|
more files than you intended.
|
|
@end deffn
|
|
|
|
|
|
@deffn Action -quit
|
|
Exit immediately (with return value zero if no errors have occurred).
|
|
This is different to @samp{-prune} because @samp{-prune} only applies
|
|
to the contents of pruned directories, while @samp{-quit} simply makes
|
|
@code{find} stop immediately. No child processes will be left
|
|
running, but no more files specified on the command line will be
|
|
processed. For example, @code{find /tmp/foo /tmp/bar -print -quit}
|
|
will print only @samp{/tmp/foo}. Any command lines which have been
|
|
built by @samp{-exec ... \+} or @samp{-execdir ... \+} are invoked
|
|
before the program is exited.
|
|
@end deffn
|
|
|
|
@deffn Option -noleaf
|
|
Do not optimize by assuming that directories contain 2 fewer
|
|
subdirectories than their hard link count. This option is needed when
|
|
searching filesystems that do not follow the Unix directory-link
|
|
convention, such as CD-ROM or MS-DOS filesystems or AFS volume mount
|
|
points. Each directory on a normal Unix filesystem has at least 2
|
|
hard links: its name and its @file{.} entry. Additionally, its
|
|
subdirectories (if any) each have a @file{..} entry linked to that
|
|
directory. When @code{find} is examining a directory, after it has
|
|
statted 2 fewer subdirectories than the directory's link count, it
|
|
knows that the rest of the entries in the directory are
|
|
non-directories (@dfn{leaf} files in the directory tree). If only the
|
|
files' names need to be examined, there is no need to stat them; this
|
|
gives a significant increase in search speed.
|
|
@end deffn
|
|
|
|
@deffn Option -ignore_readdir_race
|
|
If a file disappears after its name has been read from a directory but
|
|
before @code{find} gets around to examining the file with @code{stat},
|
|
don't issue an error message. If you don't specify this option, an
|
|
error message will be issued. This option can be useful in system
|
|
scripts (cron scripts, for example) that examine areas of the
|
|
filesystem that change frequently (mail queues, temporary directories,
|
|
and so forth), because this scenario is common for those sorts of
|
|
directories. Completely silencing error messages from @code{find} is
|
|
undesirable, so this option neatly solves the problem. There is no
|
|
way to search one part of the filesystem with this option on and part
|
|
of it with this option off, though. When this option is turned on and
|
|
find discovers that one of the start-point files specified on the
|
|
command line does not exist, no error message will be issued.
|
|
|
|
@end deffn
|
|
|
|
@deffn Option -noignore_readdir_race
|
|
This option reverses the effect of the @samp{-ignore_readdir_race}
|
|
option.
|
|
@end deffn
|
|
|
|
|
|
@node Filesystems
|
|
@section Filesystems
|
|
|
|
A @dfn{filesystem} is a section of a disk, either on the local host or
|
|
mounted from a remote host over a network. Searching network
|
|
filesystems can be slow, so it is common to make @code{find} avoid
|
|
them.
|
|
|
|
There are two ways to avoid searching certain filesystems. One way is
|
|
to tell @code{find} to only search one filesystem:
|
|
|
|
@deffn Option -xdev
|
|
@deffnx Option -mount
|
|
Don't descend directories on other filesystems. These options are
|
|
synonyms.
|
|
@end deffn
|
|
|
|
The other way is to check the type of filesystem each file is on, and
|
|
not descend directories that are on undesirable filesystem types:
|
|
|
|
@deffn Test -fstype type
|
|
True if the file is on a filesystem of type @var{type}. The valid
|
|
filesystem types vary among different versions of Unix; an incomplete
|
|
list of filesystem types that are accepted on some version of Unix or
|
|
another is:
|
|
@example
|
|
ext2 ext3 proc sysfs ufs 4.2 4.3 nfs tmp mfs S51K S52K
|
|
@end example
|
|
You can use @samp{-printf} with the @samp{%F} directive to see the
|
|
types of your filesystems. The @samp{%D} directive shows the device
|
|
number. @xref{Print File Information}. @samp{-fstype} is usually
|
|
used with @samp{-prune} to avoid searching remote filesystems
|
|
(@pxref{Directories}).
|
|
@end deffn
|
|
|
|
@node Combining Primaries With Operators
|
|
@section Combining Primaries With Operators
|
|
|
|
Operators build a complex expression from tests and actions.
|
|
The operators are, in order of decreasing precedence:
|
|
|
|
@table @code
|
|
@item @asis{( @var{expr} )}
|
|
@findex ()
|
|
Force precedence. True if @var{expr} is true.
|
|
|
|
@item @asis{! @var{expr}}
|
|
@itemx @asis{-not @var{expr}}
|
|
@findex !
|
|
@findex -not
|
|
True if @var{expr} is false. In some shells, it is necessary to
|
|
protect the @samp{!} from shell interpretation by quoting it.
|
|
|
|
@item @asis{@var{expr1 expr2}}
|
|
@itemx @asis{@var{expr1} -a @var{expr2}}
|
|
@itemx @asis{@var{expr1} -and @var{expr2}}
|
|
@findex -and
|
|
@findex -a
|
|
And; @var{expr2} is not evaluated if @var{expr1} is false.
|
|
|
|
@item @asis{@var{expr1} -o @var{expr2}}
|
|
@itemx @asis{@var{expr1} -or @var{expr2}}
|
|
@findex -or
|
|
@findex -o
|
|
Or; @var{expr2} is not evaluated if @var{expr1} is true.
|
|
|
|
@item @asis{@var{expr1} , @var{expr2}}
|
|
@findex ,
|
|
List; both @var{expr1} and @var{expr2} are always evaluated. True if
|
|
@var{expr2} is true. The value of @var{expr1} is discarded. This
|
|
operator lets you do multiple independent operations on one traversal,
|
|
without depending on whether other operations succeeded. The two
|
|
operations @var{expr1} and @var{expr2} are not always fully
|
|
independent, since @var{expr1} might have side effects like touching
|
|
or deleting files, or it might use @samp{-prune} which would also
|
|
affect @var{expr2}.
|
|
@end table
|
|
|
|
@code{find} searches the directory tree rooted at each file name by
|
|
evaluating the expression from left to right, according to the rules
|
|
of precedence, until the outcome is known (the left hand side is false
|
|
for @samp{-and}, true for @samp{-or}), at which point @code{find}
|
|
moves on to the next file name.
|
|
|
|
There are two other tests that can be useful in complex expressions:
|
|
|
|
@deffn Test -true
|
|
Always true.
|
|
@end deffn
|
|
|
|
@deffn Test -false
|
|
Always false.
|
|
@end deffn
|
|
|
|
@node Actions
|
|
@chapter Actions
|
|
|
|
There are several ways you can print information about the files that
|
|
match the criteria you gave in the @code{find} expression. You can
|
|
print the information either to the standard output or to a file that
|
|
you name. You can also execute commands that have the file names as
|
|
arguments. You can use those commands as further filters to select
|
|
files.
|
|
|
|
@menu
|
|
* Print File Name::
|
|
* Print File Information::
|
|
* Run Commands::
|
|
* Delete Files::
|
|
* Adding Tests::
|
|
@end menu
|
|
|
|
@node Print File Name
|
|
@section Print File Name
|
|
|
|
@deffn Action -print
|
|
True; print the entire file name on the standard output, followed by a
|
|
newline. If there is the faintest possibility that one of the files
|
|
for which you are searching might contain a newline, you should use
|
|
@samp{-print0} instead.
|
|
@end deffn
|
|
|
|
@deffn Action -fprint file
|
|
True; print the entire file name into file @var{file}, followed by a
|
|
newline. If @var{file} does not exist when @code{find} is run, it is
|
|
created; if it does exist, it is truncated to 0 bytes. The named
|
|
output file is always created, even if no output is sent to it. The
|
|
file names @file{/dev/stdout} and @file{/dev/stderr} are handled
|
|
specially; they refer to the standard output and standard error
|
|
output, respectively.
|
|
|
|
If there is the faintest possibility that one of the files for which
|
|
you are searching might contain a newline, you should use
|
|
@samp{-fprint0} instead.
|
|
@end deffn
|
|
|
|
|
|
@c @deffn Option -show-control-chars how
|
|
@c This option affects how some of @code{find}'s actions treat
|
|
@c unprintable characters in file names. If @samp{how} is
|
|
@c @samp{literal}, any subsequent actions (i.e., actions further on in the
|
|
@c command line) print file names as-is.
|
|
@c
|
|
@c If this option is not specified, it currently defaults to @samp{safe}.
|
|
@c If @samp{how} is @samp{safe}, C-like backslash escapes are used to
|
|
@c indicate the non-printable characters for @samp{-ls} and @samp{-fls}.
|
|
@c On the other hand, @samp{-print}, @samp{-fprint}, @samp{-fprintf} and
|
|
@c @code{-printf} all quote unprintable characters if the data is going
|
|
@c to a tty, and otherwise the data is emitted literally.
|
|
@c
|
|
@c @table @code
|
|
@c @item -ls
|
|
@c Escaped if @samp{how} is @samp{safe}
|
|
@c @item -fls
|
|
@c Escaped if @samp{how} is @samp{safe}
|
|
@c @item -print
|
|
@c Always quoted if stdout is a tty,
|
|
@c @samp{-show-control-chars} is ignored
|
|
@c @item -print0
|
|
@c Always literal, never escaped
|
|
@c @item -fprint
|
|
@c Always quoted if the destination is a tty;
|
|
@c @samp{-show-control-chars} is ignored
|
|
@c @item -fprint0
|
|
@c Always literal, never escaped
|
|
@c @item -fprintf
|
|
@c If the destination is a tty, the @samp{%f},
|
|
@c @samp{%F}, @samp{%h}, @samp{%l}, @samp{%p},
|
|
@c and @samp{%P} directives produce quoted
|
|
@c strings if stdout is a tty and are treated
|
|
@c literally otherwise.
|
|
@c @item -printf
|
|
@c As for @code{-fprintf}.
|
|
@c @end table
|
|
@c @end deffn
|
|
|
|
|
|
@node Print File Information
|
|
@section Print File Information
|
|
|
|
@deffn Action -ls
|
|
True; list the current file in @samp{ls -dils} format on the standard
|
|
output. The output looks like this:
|
|
|
|
@smallexample
|
|
204744 17 -rw-r--r-- 1 djm staff 17337 Nov 2 1992 ./lwall-quotes
|
|
@end smallexample
|
|
|
|
The fields are:
|
|
|
|
@enumerate
|
|
@item
|
|
The inode number of the file. @xref{Hard Links}, for how to find
|
|
files based on their inode number.
|
|
|
|
@item
|
|
the number of blocks in the file. The block counts are of 1K blocks,
|
|
unless the environment variable @code{POSIXLY_CORRECT} is set, in
|
|
which case 512-byte blocks are used. @xref{Size}, for how to find
|
|
files based on their size.
|
|
|
|
@item
|
|
The file's type and file mode bits. The type is shown as a dash for a
|
|
regular file; for other file types, a letter like for @samp{-type} is
|
|
used (@pxref{Type}). The file mode bits are read, write, and execute/search for
|
|
the file's owner, its group, and other users, respectively; a dash
|
|
means the permission is not granted. @xref{File Permissions}, for
|
|
more details about file permissions. @xref{Mode Bits}, for how to
|
|
find files based on their file mode bits.
|
|
|
|
@item
|
|
The number of hard links to the file.
|
|
|
|
@item
|
|
The user who owns the file.
|
|
|
|
@item
|
|
The file's group.
|
|
|
|
@item
|
|
The file's size in bytes.
|
|
|
|
@item
|
|
The date the file was last modified.
|
|
|
|
@item
|
|
The file's name. @samp{-ls} quotes non-printable characters in the
|
|
file names using C-like backslash escapes. This may change soon, as
|
|
the treatment of unprintable characters is harmonised for @samp{-ls},
|
|
@samp{-fls}, @samp{-print}, @samp{-fprint}, @samp{-printf} and
|
|
@samp{-fprintf}.
|
|
@end enumerate
|
|
@end deffn
|
|
|
|
@deffn Action -fls file
|
|
True; like @samp{-ls} but write to @var{file} like @samp{-fprint}
|
|
(@pxref{Print File Name}). The named output file is always created,
|
|
even if no output is sent to it.
|
|
@end deffn
|
|
|
|
@deffn Action -printf format
|
|
True; print @var{format} on the standard output, interpreting @samp{\}
|
|
escapes and @samp{%} directives (more details in the following
|
|
sections).
|
|
|
|
Field widths and precisions can be specified as with the @code{printf} C
|
|
function. Format flags (like @samp{#} for example) may not work as you
|
|
expect because many of the fields, even numeric ones, are printed with
|
|
%s. Numeric flags which are affected in this way include @samp{G},
|
|
@samp{U}, @samp{b}, @samp{D}, @samp{k} and @samp{n}. This difference in
|
|
behaviour means though that the format flag @samp{-} will work; it
|
|
forces left-alignment of the field. Unlike @samp{-print},
|
|
@samp{-printf} does not add a newline at the end of the string. If you
|
|
want a newline at the end of the string, add a @samp{\n}.
|
|
|
|
As an example, an approximate equivalent of @samp{-ls} with
|
|
null-terminated filenames can be achieved with this @code{-printf}
|
|
format:
|
|
|
|
@example
|
|
find -printf "%i %4k %M %3n %-8u %-8g %8s %T+ %p\n->%l\0" | cat
|
|
@end example
|
|
|
|
A practical reason for doing this would be to get literal filenames in
|
|
the output, instead of @samp{-ls}'s backslash-escaped names. (Using
|
|
@code{cat} here prevents this happening for the @samp{%p} format
|
|
specifier; @pxref{Unusual Characters in File Names}). This format also
|
|
outputs a uniform timestamp format.
|
|
|
|
As for symlinks, the format above outputs the symlink target on a second
|
|
line, following @samp{\n->}. There is nothing following the arrow for
|
|
non-symlinks. Another approach, for complete consistency, would be to
|
|
@code{-fprintf} the symlinks into a separate file, so they too can be
|
|
null-terminated.
|
|
@end deffn
|
|
|
|
@deffn Action -fprintf file format
|
|
True; like @samp{-printf} but write to @var{file} like @samp{-fprint}
|
|
(@pxref{Print File Name}). The output file is always created, even if
|
|
no output is ever sent to it.
|
|
@end deffn
|
|
|
|
@menu
|
|
* Escapes::
|
|
* Format Directives::
|
|
* Time Formats::
|
|
* Formatting Flags::
|
|
@end menu
|
|
|
|
@node Escapes
|
|
@subsection Escapes
|
|
|
|
The escapes that @samp{-printf} and @samp{-fprintf} recognise are:
|
|
|
|
@table @code
|
|
@item \a
|
|
Alarm bell.
|
|
@item \b
|
|
Backspace.
|
|
@item \c
|
|
Stop printing from this format immediately and flush the output.
|
|
@item \f
|
|
Form feed.
|
|
@item \n
|
|
Newline.
|
|
@item \r
|
|
Carriage return.
|
|
@item \t
|
|
Horizontal tab.
|
|
@item \v
|
|
Vertical tab.
|
|
@item \\
|
|
A literal backslash (@samp{\}).
|
|
@item \0
|
|
ASCII NUL.
|
|
@item \NNN
|
|
The character whose ASCII code is NNN (octal).
|
|
@end table
|
|
|
|
A @samp{\} character followed by any other character is treated as an
|
|
ordinary character, so they both are printed, and a warning message is
|
|
printed to the standard error output (because it was probably a typo).
|
|
|
|
@node Format Directives
|
|
@subsection Format Directives
|
|
|
|
@samp{-printf} and @samp{-fprintf} support the following format
|
|
directives to print information about the file being processed. The C
|
|
@code{printf} function, field width and precision specifiers are
|
|
supported, as applied to string (%s) types. That is, you can specify
|
|
"minimum field width"."maximum field width" for each directive.
|
|
Format flags (like @samp{#} for example) may not work as you expect
|
|
because many of the fields, even numeric ones, are printed with %s.
|
|
The format flag @samp{-} does work; it forces left-alignment of the
|
|
field.
|
|
|
|
@samp{%%} is a literal percent sign. @xref{Reserved and Unknown
|
|
Directives}, for a description of how format directives not mentioned
|
|
below are handled.
|
|
|
|
A @samp{%} at the end of the format argument causes undefined
|
|
behaviour since there is no following character. In some locales, it
|
|
may hide your door keys, while in others it may remove the final page
|
|
from the novel you are reading.
|
|
|
|
@menu
|
|
* Name Directives::
|
|
* Ownership Directives::
|
|
* Size Directives::
|
|
* Location Directives::
|
|
* Time Directives::
|
|
* Other Directives::
|
|
* Reserved and Unknown Directives::
|
|
@end menu
|
|
|
|
@node Name Directives
|
|
@subsubsection Name Directives
|
|
|
|
@table @code
|
|
@item %p
|
|
@c supports %-X.Yp
|
|
File's name (not the absolute path name, but the name of the file as
|
|
it was encountered by @code{find} - that is, as a relative path from
|
|
one of the starting points).
|
|
@item %f
|
|
File's name with any leading directories removed (only the last
|
|
element).
|
|
@c supports %-X.Yf
|
|
@item %h
|
|
Leading directories of file's name (all but the last element and the
|
|
slash before it). If the file's name contains no slashes (for example
|
|
because it was named on the command line and is in the current working
|
|
directory), then ``%h'' expands to ``.''. This prevents ``%h/%f''
|
|
expanding to ``/foo'', which would be surprising and probably not
|
|
desirable.
|
|
@c supports %-X.Yh
|
|
@item %P
|
|
File's name with the name of the command line argument under which
|
|
it was found removed from the beginning.
|
|
@c supports %-X.YP
|
|
@item %H
|
|
Command line argument under which file was found.
|
|
@c supports %-X.YH
|
|
@end table
|
|
|
|
@node Ownership Directives
|
|
@subsubsection Ownership Directives
|
|
|
|
@table @code
|
|
@item %g
|
|
@c supports %-X.Yg
|
|
File's group name, or numeric group ID if the group has no name.
|
|
@item %G
|
|
@c supports %-X.Yg
|
|
@c TODO: Needs to support # flag and 0 flag
|
|
File's numeric group ID.
|
|
@item %u
|
|
@c supports %-X.Yu
|
|
File's user name, or numeric user ID if the user has no name.
|
|
@item %U
|
|
@c supports %-X.Yu
|
|
@c TODO: Needs to support # flag
|
|
File's numeric user ID.
|
|
@item %m
|
|
@c full support, including # and 0.
|
|
File's mode bits (in octal). If you always want to have a leading
|
|
zero on the number, use the '#' format flag, for example '%#m'.
|
|
|
|
The file mode bit numbers used are the traditional Unix
|
|
numbers, which will be as expected on most systems, but if your
|
|
system's file mode bit layout differs from the traditional Unix
|
|
semantics, you will see a difference between the mode as printed by
|
|
@samp{%m} and the mode as it appears in @code{struct stat}.
|
|
|
|
@item %M
|
|
File's type and mode bits (in symbolic form, as for @code{ls}). This
|
|
directive is supported in findutils 4.2.5 and later.
|
|
@end table
|
|
|
|
@node Size Directives
|
|
@subsubsection Size Directives
|
|
|
|
@table @code
|
|
@item %k
|
|
The amount of disk space used for this file in 1K blocks. Since disk
|
|
space is allocated in multiples of the filesystem block size this is
|
|
usually greater than %s/1024, but it can also be smaller if the file
|
|
is a sparse file (that is, it has ``holes'').
|
|
@item %b
|
|
The amount of disk space used for this file in 512-byte blocks. Since
|
|
disk space is allocated in multiples of the filesystem block size this
|
|
is usually greater than %s/512, but it can also be smaller if the
|
|
file is a sparse file (that is, it has ``holes'').
|
|
@item %s
|
|
File's size in bytes.
|
|
@item %S
|
|
File's sparseness. This is calculated as @code{(BLOCKSIZE*st_blocks /
|
|
st_size)}. The exact value you will get for an ordinary file of a
|
|
certain length is system-dependent. However, normally sparse files
|
|
will have values less than 1.0, and files which use indirect blocks
|
|
and have few holes may have a value which is greater than 1.0. The
|
|
value used for BLOCKSIZE is system-dependent, but is usually 512
|
|
bytes. If the file size is zero, the value printed is undefined. On
|
|
systems which lack support for st_blocks, a file's sparseness is
|
|
assumed to be 1.0.
|
|
@end table
|
|
|
|
@node Location Directives
|
|
@subsubsection Location Directives
|
|
|
|
@table @code
|
|
@item %d
|
|
File's depth in the directory tree (depth below a file named on the
|
|
command line, not depth below the root directory). Files named on the
|
|
command line have a depth of 0. Subdirectories immediately below them
|
|
have a depth of 1, and so on.
|
|
@item %D
|
|
The device number on which the file exists (the @code{st_dev} field of
|
|
@code{struct stat}), in decimal.
|
|
@item %F
|
|
Type of the filesystem the file is on; this value can be used for
|
|
@samp{-fstype} (@pxref{Directories}).
|
|
@item %l
|
|
Object of symbolic link (empty string if file is not a symbolic link).
|
|
@item %i
|
|
File's inode number (in decimal).
|
|
@item %n
|
|
Number of hard links to file.
|
|
@item %y
|
|
Type of the file as used with @samp{-type}. If the file is a symbolic
|
|
link, @samp{l} will be printed.
|
|
@item %Y
|
|
Type of the file as used with @samp{-type}. If the file is a symbolic
|
|
link, it is dereferenced. If the file is a broken symbolic link,
|
|
@samp{N} is printed.
|
|
|
|
@end table
|
|
|
|
@node Time Directives
|
|
@subsubsection Time Directives
|
|
|
|
Some of these directives use the C @code{ctime} function. Its output
|
|
depends on the current locale, but it typically looks like
|
|
|
|
@example
|
|
Wed Nov 2 00:42:36 1994
|
|
@end example
|
|
|
|
@table @code
|
|
@item %a
|
|
File's last access time in the format returned by the C @code{ctime}
|
|
function.
|
|
@item %A@var{k}
|
|
File's last access time in the format specified by @var{k}
|
|
(@pxref{Time Formats}).
|
|
@item %c
|
|
File's last status change time in the format returned by the C
|
|
@code{ctime} function.
|
|
@item %C@var{k}
|
|
File's last status change time in the format specified by @var{k}
|
|
(@pxref{Time Formats}).
|
|
@item %t
|
|
File's last modification time in the format returned by the C
|
|
@code{ctime} function.
|
|
@item %T@var{k}
|
|
File's last modification time in the format specified by @var{k}
|
|
(@pxref{Time Formats}).
|
|
@end table
|
|
|
|
@node Other Directives
|
|
@subsubsection Other Directives
|
|
|
|
@table @code
|
|
@item %Z
|
|
File's SELinux context, or empty string if the file has no SELinux context.
|
|
@end table
|
|
|
|
@node Reserved and Unknown Directives
|
|
@subsubsection Reserved and Unknown Directives
|
|
|
|
The @samp{%(}, @samp{%@{} and @samp{%[} format directives, with or
|
|
without field with and precision specifications, are reserved for
|
|
future use. Don't use them and don't rely on current experiment to
|
|
predict future behaviour. To print @samp{(}, simply use @samp{(}
|
|
rather than @samp{%(}. Likewise for @samp{@{} and @samp{[}.
|
|
|
|
Similarly, a @samp{%} character followed by any other unrecognised
|
|
character (i.e., not a known directive or @code{printf} field width
|
|
and precision specifier), is discarded (but the unrecognised character
|
|
is printed), and a warning message is printed to the standard error
|
|
output (because it was probably a typo). Don't rely on this
|
|
behaviour, because other directives may be added in the future.
|
|
|
|
|
|
@node Time Formats
|
|
@subsection Time Formats
|
|
|
|
Below are the formats for the directives @samp{%A}, @samp{%C}, and
|
|
@samp{%T}, which print the file's timestamps. Some of these formats
|
|
might not be available on all systems, due to differences in the C
|
|
@code{strftime} function between systems.
|
|
|
|
@menu
|
|
* Time Components::
|
|
* Date Components::
|
|
* Combined Time Formats::
|
|
@end menu
|
|
|
|
@node Time Components
|
|
@subsubsection Time Components
|
|
|
|
The following format directives print single components of the time.
|
|
|
|
@table @code
|
|
@item H
|
|
hour (00..23)
|
|
@item I
|
|
hour (01..12)
|
|
@item k
|
|
hour ( 0..23)
|
|
@item l
|
|
hour ( 1..12)
|
|
@item p
|
|
locale's AM or PM
|
|
@item Z
|
|
time zone (e.g., EDT), or nothing if no time zone is determinable
|
|
@item M
|
|
minute (00..59)
|
|
@item S
|
|
second (00..61). There is a fractional part.
|
|
@item @@
|
|
seconds since Jan. 1, 1970, 00:00 GMT, with fractional part.
|
|
@end table
|
|
|
|
The fractional part of the seconds field is of indeterminate length
|
|
and precision. That is, the length of the fractional part of the
|
|
seconds field will in general vary between findutils releases and
|
|
between systems. This means that it is unwise to assume that field
|
|
has any specific length. The length of this field is not usually a
|
|
guide to the precision of timestamps in the underlying file system.
|
|
|
|
|
|
|
|
@node Date Components
|
|
@subsubsection Date Components
|
|
|
|
The following format directives print single components of the date.
|
|
|
|
@table @code
|
|
@item a
|
|
locale's abbreviated weekday name (Sun..Sat)
|
|
@item A
|
|
locale's full weekday name, variable length (Sunday..Saturday)
|
|
@item b
|
|
@itemx h
|
|
locale's abbreviated month name (Jan..Dec)
|
|
@item B
|
|
locale's full month name, variable length (January..December)
|
|
@item m
|
|
month (01..12)
|
|
@item d
|
|
day of month (01..31)
|
|
@item w
|
|
day of week (0..6)
|
|
@item j
|
|
day of year (001..366)
|
|
@item U
|
|
week number of year with Sunday as first day of week (00..53)
|
|
@item W
|
|
week number of year with Monday as first day of week (00..53)
|
|
@item Y
|
|
year (1970@dots{})
|
|
@item y
|
|
last two digits of year (00..99)
|
|
@end table
|
|
|
|
@node Combined Time Formats
|
|
@subsubsection Combined Time Formats
|
|
|
|
The following format directives print combinations of time and date
|
|
components.
|
|
|
|
@table @code
|
|
@item r
|
|
time, 12-hour (hh:mm:ss [AP]M)
|
|
@item T
|
|
time, 24-hour (hh:mm:ss)
|
|
@item X
|
|
locale's time representation (H:M:S)
|
|
@item c
|
|
locale's date and time in ctime format (Sat Nov 04 12:02:33 EST
|
|
1989). This format does not include any fractional part in the
|
|
seconds field.
|
|
@item D
|
|
date (mm/dd/yy)
|
|
@item x
|
|
locale's date representation (mm/dd/yy)
|
|
@item +
|
|
Date and time, separated by '+', for example
|
|
`2004-04-28+22:22:05.0000000000'.
|
|
The time is given in the current timezone (which may be affected by
|
|
setting the TZ environment variable). This is a GNU extension. The
|
|
seconds field includes a fractional part.
|
|
@end table
|
|
|
|
@node Formatting Flags
|
|
@subsection Formatting Flags
|
|
|
|
The @samp{%m} and @samp{%d} directives support the @samp{#}, @samp{0}
|
|
and @samp{+} flags, but the other directives do not, even if they
|
|
print numbers. Numeric directives that do not support these flags
|
|
include
|
|
|
|
@samp{G},
|
|
@samp{U},
|
|
@samp{b},
|
|
@samp{D},
|
|
@samp{k} and
|
|
@samp{n}.
|
|
|
|
All fields support the format flag @samp{-}, which makes fields
|
|
left-aligned. That is, if the field width is greater than the actual
|
|
contents of the field, the requisite number of spaces are printed
|
|
after the field content instead of before it.
|
|
|
|
@node Run Commands
|
|
@section Run Commands
|
|
|
|
You can use the list of file names created by @code{find} or
|
|
@code{locate} as arguments to other commands. In this way you can
|
|
perform arbitrary actions on the files.
|
|
|
|
@menu
|
|
* Single File::
|
|
* Multiple Files::
|
|
* Querying::
|
|
@end menu
|
|
|
|
@node Single File
|
|
@subsection Single File
|
|
|
|
Here is how to run a command on one file at a time.
|
|
|
|
@deffn Action -execdir command ;
|
|
Execute @var{command}; true if @var{command} returns zero. @code{find}
|
|
takes all arguments after @samp{-execdir} to be part of the command until
|
|
an argument consisting of @samp{;} is reached. It replaces the string
|
|
@samp{@{@}} by the current file name being processed everywhere it
|
|
occurs in the command. Both of these constructions need to be escaped
|
|
(with a @samp{\}) or quoted to protect them from expansion by the
|
|
shell. The command is executed in the directory which @code{find}
|
|
was searching at the time the action was executed (that is, @{@} will
|
|
expand to a file in the local directory).
|
|
|
|
For example, to compare each C header file in or below the current
|
|
directory with the file @file{/tmp/master}:
|
|
|
|
@example
|
|
find . -name '*.h' -execdir diff -u '@{@}' /tmp/master ';'
|
|
@end example
|
|
@end deffn
|
|
|
|
If you use @samp{-execdir}, you must ensure that the @samp{$PATH}
|
|
variable contains only absolute directory names. Having an empty
|
|
element in @samp{$PATH} or explicitly including @samp{.} (or any other
|
|
non-absolute name) is insecure. GNU find will refuse to run if you
|
|
use @samp{-execdir} and it thinks your @samp{$PATH} setting is
|
|
insecure. For example:
|
|
|
|
@table @samp
|
|
@item /bin:/usr/bin:
|
|
Insecure; empty path element (at the end)
|
|
@item :/bin:/usr/bin:/usr/local/bin
|
|
Insecure; empty path element (at the start)
|
|
@item /bin:/usr/bin::/usr/local/bin
|
|
Insecure; empty path element (two colons in a row)
|
|
@item /bin:/usr/bin:.:/usr/local/bin
|
|
Insecure; @samp{.} is a path element (@file{.} is not an absolute file name)
|
|
@item /bin:/usr/bin:sbin:/usr/local/bin
|
|
Insecure; @samp{sbin} is not an absolute file name
|
|
@item /bin:/usr/bin:/sbin:/usr/local/bin
|
|
Secure (if you control the contents of those directories and any access to them)
|
|
@end table
|
|
|
|
Another similar option, @samp{-exec} is supported, but is less secure.
|
|
@xref{Security Considerations}, for a discussion of the security
|
|
problems surrounding @samp{-exec}.
|
|
|
|
|
|
@deffn Action -exec command ;
|
|
This insecure variant of the @samp{-execdir} action is specified by
|
|
POSIX. Like @samp{-execdir command ;} it is true if zero is
|
|
returned by @var{command}. The main difference is that the command is
|
|
executed in the directory from which @code{find} was invoked, meaning
|
|
that @samp{@{@}} is expanded to a relative path starting with the name
|
|
of one of the starting directories, rather than just the basename of
|
|
the matched file.
|
|
|
|
While some implementations of @code{find} replace the @samp{@{@}} only
|
|
where it appears on its own in an argument, GNU @code{find} replaces
|
|
@samp{@{@}} wherever it appears.
|
|
@end deffn
|
|
|
|
|
|
@node Multiple Files
|
|
@subsection Multiple Files
|
|
|
|
Sometimes you need to process files one at a time. But usually this
|
|
is not necessary, and, it is faster to run a command on as many files
|
|
as possible at a time, rather than once per file. Doing this saves on
|
|
the time it takes to start up the command each time.
|
|
|
|
The @samp{-execdir} and @samp{-exec} actions have variants that build
|
|
command lines containing as many matched files as possible.
|
|
|
|
@deffn Action -execdir command @{@} +
|
|
This works as for @samp{-execdir command ;}, except that the result is
|
|
always true, and the @samp{@{@}} at the end of the command is expanded
|
|
to a list of names of matching files. This expansion is done in such
|
|
a way as to avoid exceeding the maximum command line length available
|
|
on the system. Only one @samp{@{@}} is allowed within the command,
|
|
and it must appear at the end, immediately before the @samp{+}. A
|
|
@samp{+} appearing in any position other than immediately after
|
|
@samp{@{@}} is not considered to be special (that is, it does not
|
|
terminate the command).
|
|
@end deffn
|
|
|
|
|
|
@deffn Action -exec command @{@} +
|
|
This insecure variant of the @samp{-execdir} action is specified by
|
|
POSIX. The main difference is that the command is executed in the
|
|
directory from which @code{find} was invoked, meaning that @samp{@{@}}
|
|
is expanded to a relative path starting with the name of one of the
|
|
starting directories, rather than just the basename of the matched
|
|
file. The result is always true.
|
|
@end deffn
|
|
|
|
Before @code{find} exits, any partially-built command lines are
|
|
executed. This happens even if the exit was caused by the
|
|
@samp{-quit} action. However, some types of error (for example not
|
|
being able to invoke @code{stat()} on the current directory) can cause
|
|
an immediate fatal exit. In this situation, any partially-built
|
|
command lines will not be invoked (this prevents possible infinite
|
|
loops).
|
|
|
|
At first sight, it looks like the list of filenames to be processed
|
|
can only be at the end of the command line, and that this might be a
|
|
problem for some commands (@code{cp} and @code{rsync} for example).
|
|
|
|
However, there is a slightly obscure but powerful workaround for this
|
|
problem which takes advantage of the behaviour of @code{sh -c}:
|
|
|
|
@example
|
|
find startpoint -tests @dots{} -exec sh -c 'scp "$@@" remote:/dest' sh @{@} +
|
|
@end example
|
|
|
|
In the example above, the filenames we want to work on need to occur
|
|
on the @code{scp} command line before the name of the destination. We
|
|
use the shell to invoke the command @code{scp "$@@" remote:/dest} and
|
|
the shell expands @code{"$@@"} to the list of filenames we want to
|
|
process.
|
|
|
|
Another, but less secure, way to run a command on more than one file
|
|
at once, is to use the @code{xargs} command, which is invoked like
|
|
this:
|
|
|
|
@example
|
|
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
|
@end example
|
|
|
|
@code{xargs} normally reads arguments from the standard input. These
|
|
arguments are delimited by blanks (which can be protected with double
|
|
or single quotes or a backslash) or newlines. It executes the
|
|
@var{command} (the default is @file{echo}) one or more times with any
|
|
@var{initial-arguments} followed by arguments read from standard
|
|
input. Blank lines on the standard input are ignored. If the
|
|
@samp{-L} option is in use, trailing blanks indicate that @code{xargs}
|
|
should consider the following line to be part of this one.
|
|
|
|
Instead of blank-delimited names, it is safer to use @samp{find
|
|
-print0} or @samp{find -fprint0} and process the output by giving the
|
|
@samp{-0} or @samp{--null} option to GNU @code{xargs}, GNU @code{tar},
|
|
GNU @code{cpio}, or @code{perl}. The @code{locate} command also has a
|
|
@samp{-0} or @samp{--null} option which does the same thing.
|
|
|
|
You can use shell command substitution (backquotes) to process a list
|
|
of arguments, like this:
|
|
|
|
@example
|
|
grep -l sprintf `find $HOME -name '*.c' -print`
|
|
@end example
|
|
|
|
However, that method produces an error if the length of the @samp{.c}
|
|
file names exceeds the operating system's command line length limit.
|
|
@code{xargs} avoids that problem by running the command as many times
|
|
as necessary without exceeding the limit:
|
|
|
|
@example
|
|
find $HOME -name '*.c' -print | xargs grep -l sprintf
|
|
@end example
|
|
|
|
However, if the command needs to have its standard input be a terminal
|
|
(@code{less}, for example), you have to use the shell command
|
|
substitution method or use the @samp{--arg-file} option of
|
|
@code{xargs}.
|
|
|
|
The @code{xargs} command will process all its input, building command
|
|
lines and executing them, unless one of the commands exits with a
|
|
status of 255 (this will cause xargs to issue an error message and
|
|
stop) or it reads a line contains the end of file string specified
|
|
with the @samp{--eof} option.
|
|
|
|
@menu
|
|
* Unsafe File Name Handling::
|
|
* Safe File Name Handling::
|
|
* Unusual Characters in File Names::
|
|
* Limiting Command Size::
|
|
* Controlling Parallelism::
|
|
* Interspersing File Names::
|
|
@end menu
|
|
|
|
@node Unsafe File Name Handling
|
|
@subsubsection Unsafe File Name Handling
|
|
|
|
Because file names can contain quotes, backslashes, blank characters,
|
|
and even newlines, it is not safe to process them using @code{xargs}
|
|
in its default mode of operation. But since most files' names do not
|
|
contain blanks, this problem occurs only infrequently. If you are
|
|
only searching through files that you know have safe names, then you
|
|
need not be concerned about it.
|
|
|
|
Error messages issued by @code{find} and @code{locate} quote unusual
|
|
characters in file names in order to prevent unwanted changes in the
|
|
terminal's state.
|
|
|
|
|
|
@c This example is adapted from:
|
|
@c From: pfalstad@stone.Princeton.EDU (Paul John Falstad)
|
|
@c Newsgroups: comp.unix.shell
|
|
@c Subject: Re: Beware xargs security holes
|
|
@c Date: 16 Oct 90 19:12:06 GMT
|
|
@c
|
|
In many applications, if @code{xargs} botches processing a file
|
|
because its name contains special characters, some data might be lost.
|
|
The importance of this problem depends on the importance of the data
|
|
and whether anyone notices the loss soon enough to correct it.
|
|
However, here is an extreme example of the problems that using
|
|
blank-delimited names can cause. If the following command is run
|
|
daily from @code{cron}, then any user can remove any file on the
|
|
system:
|
|
|
|
@example
|
|
find / -name '#*' -atime +7 -print | xargs rm
|
|
@end example
|
|
|
|
For example, you could do something like this:
|
|
|
|
@example
|
|
eg$ echo > '#
|
|
vmunix'
|
|
@end example
|
|
|
|
@noindent
|
|
and then @code{cron} would delete @file{/vmunix}, if it ran
|
|
@code{xargs} with @file{/} as its current directory.
|
|
|
|
To delete other files, for example @file{/u/joeuser/.plan}, you could
|
|
do this:
|
|
|
|
@example
|
|
eg$ mkdir '#
|
|
'
|
|
eg$ cd '#
|
|
'
|
|
eg$ mkdir u u/joeuser u/joeuser/.plan'
|
|
'
|
|
eg$ echo > u/joeuser/.plan'
|
|
/#foo'
|
|
eg$ cd ..
|
|
eg$ find . -name '#*' -print | xargs echo
|
|
./# ./# /u/joeuser/.plan /#foo
|
|
@end example
|
|
|
|
@node Safe File Name Handling
|
|
@subsubsection Safe File Name Handling
|
|
|
|
Here is how to make @code{find} output file names so that they can be
|
|
used by other programs without being mangled or misinterpreted. You
|
|
can process file names generated this way by giving the @samp{-0} or
|
|
@samp{--null} option to GNU @code{xargs}, GNU @code{tar}, GNU
|
|
@code{cpio}, or @code{perl}.
|
|
|
|
@deffn Action -print0
|
|
True; print the entire file name on the standard output, followed by a
|
|
null character.
|
|
@end deffn
|
|
|
|
@deffn Action -fprint0 file
|
|
True; like @samp{-print0} but write to @var{file} like @samp{-fprint}
|
|
(@pxref{Print File Name}). The output file is always created.
|
|
@end deffn
|
|
|
|
As of findutils version 4.2.4, the @code{locate} program also has a
|
|
@samp{--null} option which does the same thing. For similarity with
|
|
@code{xargs}, the short form of the option @samp{-0} can also be used.
|
|
|
|
If you want to be able to handle file names safely but need to run
|
|
commands which want to be connected to a terminal on their input, you
|
|
can use the @samp{--arg-file} option to @code{xargs} like this:
|
|
|
|
@example
|
|
find / -name xyzzy -print0 > list
|
|
xargs --null --arg-file=list munge
|
|
@end example
|
|
|
|
The example above runs the @code{munge} program on all the files named
|
|
@file{xyzzy} that we can find, but @code{munge}'s input will still be
|
|
the terminal (or whatever the shell was using as standard input). If
|
|
your shell has the ``process substitution'' feature @samp{<(...)}, you
|
|
can do this in just one step:
|
|
|
|
@example
|
|
xargs --null --arg-file=<(find / -name xyzzy -print0) munge
|
|
@end example
|
|
|
|
@node Unusual Characters in File Names
|
|
@subsubsection Unusual Characters in File Names
|
|
As discussed above, you often need to be careful about how the names
|
|
of files are handled by @code{find} and other programs. If the output
|
|
of @code{find} is not going to another program but instead is being
|
|
shown on a terminal, this can still be a problem. For example, some
|
|
character sequences can reprogram the function keys on some terminals.
|
|
@xref{Security Considerations}, for a discussion of other security
|
|
problems relating to @code{find}.
|
|
|
|
Unusual characters are handled differently by various
|
|
actions, as described below.
|
|
|
|
@table @samp
|
|
@item -print0
|
|
@itemx -fprint0
|
|
Always print the exact file name, unchanged, even if the output is
|
|
going to a terminal.
|
|
@item -ok
|
|
@itemx -okdir
|
|
Always print the exact file name, unchanged. This will probably
|
|
change in a future release.
|
|
@item -ls
|
|
@itemx -fls
|
|
Unusual characters are always escaped. White space, backslash, and
|
|
double quote characters are printed using C-style escaping (for
|
|
example @samp{\f}, @samp{\"}). Other unusual characters are printed
|
|
using an octal escape. Other printable characters (for @samp{-ls} and
|
|
@samp{-fls} these are the characters between octal 041 and 0176) are
|
|
printed as-is.
|
|
@item -printf
|
|
@itemx -fprintf
|
|
If the output is not going to a terminal, it is printed as-is.
|
|
Otherwise, the result depends on which directive is in use:
|
|
|
|
@table @asis
|
|
@item %D, %F, %H, %Y, %y
|
|
These expand to values which are not under control of files' owners,
|
|
and so are printed as-is.
|
|
@item %a, %b, %c, %d, %g, %G, %i, %k, %m, %M, %n, %s, %t, %u, %U
|
|
These have values which are under the control of files' owners but
|
|
which cannot be used to send arbitrary data to the terminal, and so
|
|
these are printed as-is.
|
|
@item %f, %h, %l, %p, %P
|
|
The output of these directives is quoted if the output is going to a
|
|
terminal. The setting of the @code{LC_CTYPE} environment
|
|
variable is used to determine which characters need to be quoted.
|
|
|
|
This quoting is performed in the same way as for GNU @code{ls}. This
|
|
is not the same quoting mechanism as the one used for @samp{-ls} and
|
|
@samp{fls}. If you are able to decide what format to use for the
|
|
output of @code{find} then it is normally better to use @samp{\0} as a
|
|
terminator than to use newline, as file names can contain white space
|
|
and newline characters.
|
|
@end table
|
|
@item -print
|
|
@itemx -fprint
|
|
Quoting is handled in the same way as for the @samp{%p} directive of
|
|
@samp{-printf} and @samp{-fprintf}. If you are using @code{find} in a
|
|
script or in a situation where the matched files might have arbitrary
|
|
names, you should consider using @samp{-print0} instead of
|
|
@samp{-print}.
|
|
@end table
|
|
|
|
|
|
The @code{locate} program quotes and escapes unusual characters in
|
|
file names in the same way as @code{find}'s @samp{-print} action.
|
|
|
|
The behaviours described above may change soon, as the treatment of
|
|
unprintable characters is harmonised for @samp{-ls}, @samp{-fls},
|
|
@samp{-print}, @samp{-fprint}, @samp{-printf} and @samp{-fprintf}.
|
|
|
|
@node Limiting Command Size
|
|
@subsubsection Limiting Command Size
|
|
|
|
@code{xargs} gives you control over how many arguments it passes to
|
|
the command each time it executes it. By default, it uses up to
|
|
@code{ARG_MAX} - 2k, or 128k, whichever is smaller, characters per
|
|
command. It uses as many lines and arguments as fit within that
|
|
limit. The following options modify those values.
|
|
|
|
@table @code
|
|
@item --no-run-if-empty
|
|
@itemx -r
|
|
If the standard input does not contain any nonblanks, do not run the
|
|
command. By default, the command is run once even if there is no
|
|
input. This option is a GNU extension.
|
|
|
|
@item --max-lines@r{[}=@var{max-lines}@r{]}
|
|
@itemx -L @var{max-lines}
|
|
@itemx -l@r{[}@var{max-lines}@r{]}
|
|
Use at most @var{max-lines} nonblank input lines per command line;
|
|
@var{max-lines} defaults to 1 if omitted; omitting the argument is not
|
|
allowed in the case of the @samp{-L} option. Trailing blanks cause an
|
|
input line to be logically continued on the next input line, for the
|
|
purpose of counting the lines. Implies @samp{-x}. The preferred name
|
|
for this option is @samp{-L} as this is specified by POSIX.
|
|
|
|
@item --max-args=@var{max-args}
|
|
@itemx -n @var{max-args}
|
|
Use at most @var{max-args} arguments per command line. Fewer than
|
|
@var{max-args} arguments will be used if the size (see the @samp{-s}
|
|
option) is exceeded, unless the @samp{-x} option is given, in which
|
|
case @code{xargs} will exit.
|
|
|
|
@item --max-chars=@var{max-chars}
|
|
@itemx -s @var{max-chars}
|
|
Use at most @var{max-chars} characters per command line, including the
|
|
command initial arguments and the terminating nulls at the ends of the
|
|
argument strings. If you specify a value for this option which is too
|
|
large or small, a warning message is printed and the appropriate upper
|
|
or lower limit is used instead. You can use @samp{--show-limits}
|
|
option to understand the command-line limits applying to @code{xargs}
|
|
and how this is affected by any other options. The POSIX limits shown
|
|
when you do this have already been adjusted to take into account the
|
|
size of your environment variables.
|
|
|
|
The largest allowed value is system-dependent, and is calculated as
|
|
the argument length limit for exec, less the size of your environment,
|
|
less 2048 bytes of headroom. If this value is more than 128KiB,
|
|
128Kib is used as the default value; otherwise, the default value is
|
|
the maximum.
|
|
@end table
|
|
|
|
@node Controlling Parallelism
|
|
@subsubsection Controlling Parallelism
|
|
|
|
Normally, @code{xargs} runs one command at a time. This is called
|
|
"serial" execution; the commands happen in a series, one after another.
|
|
If you'd like @code{xargs} to do things in "parallel", you can ask it
|
|
to do so, either when you invoke it, or later while it is running.
|
|
Running several commands at one time can make the entire operation
|
|
go more quickly, if the commands are independent, and if your system
|
|
has enough resources to handle the load. When parallelism works in
|
|
your application, @code{xargs} provides an easy way to get your work
|
|
done faster.
|
|
|
|
@table @code
|
|
@item --max-procs=@var{max-procs}
|
|
@itemx -P @var{max-procs}
|
|
Run up to @var{max-procs} processes at a time; the default is 1. If
|
|
@var{max-procs} is 0, @code{xargs} will run as many processes as
|
|
possible at a time. Use the @samp{-n}, @samp{-s}, or @samp{-L} option
|
|
with @samp{-P}; otherwise chances are that the command will be run
|
|
only once.
|
|
@end table
|
|
|
|
For example, suppose you have a directory tree of large image files
|
|
and a @code{makeallsizes} script that takes a single file name and
|
|
creates various sized images from it (thumbnail-sized, web-page-sized,
|
|
printer-sized, and the original large file). The script is doing enough
|
|
work that it takes significant time to run, even on a single image.
|
|
You could run:
|
|
|
|
@example
|
|
find originals -name '*.jpg' | xargs -1 makeallsizes
|
|
@end example
|
|
|
|
This will run @code{makeallsizes @var{filename}} once for each @code{.jpg}
|
|
file in the @code{originals} directory. However, if your system has
|
|
two central processors, this script will only keep one of them busy.
|
|
Instead, you could probably finish in about half the time by running:
|
|
|
|
@example
|
|
find originals -name '*.jpg' | xargs -1 -P 2 makeallsizes
|
|
@end example
|
|
|
|
@code{xargs} will run the first two commands in parallel, and then
|
|
whenever one of them terminates, it will start another one, until
|
|
the entire job is done.
|
|
|
|
The same idea can be generalized to as many processors as you have handy.
|
|
It also generalizes to other resources besides processors. For example,
|
|
if @code{xargs} is running commands that are waiting for a response from a
|
|
distant network connection, running a few in parallel may reduce the
|
|
overall latency by overlapping their waiting time.
|
|
|
|
If you are running commands in parallel, you need to think about how
|
|
they should arbitrate access to any resources that they share. For
|
|
example, if more than one of them tries to print to stdout, the ouptut
|
|
will be produced in an indeterminate order (and very likely mixed up)
|
|
unless the processes collaborate in some way to prevent this. Using
|
|
some kind of locking scheme is one way to prevent such problems. In
|
|
general, using a locking scheme will help ensure correct output but
|
|
reduce performance. If you don't want to tolerate the performance
|
|
difference, simply arrange for each process to produce a separate output
|
|
file (or otherwise use separate resources).
|
|
|
|
@code{xargs} also allows you to ``turn up'' or ``turn down'' its parallelism
|
|
in the middle of a run. Suppose you are keeping your four-processor
|
|
system busy for hours, processing thousands of images using @code{-P 4}.
|
|
Now, in the middle of the run, you or someone else wants you to reduce
|
|
your load on the system, so that something else will run faster.
|
|
If you interrupt @code{xargs}, your job will be half-done, and it
|
|
may take significant manual work to resume it only for the remaining
|
|
images. If you suspend @code{xargs} using your shell's job controls
|
|
(e.g. @code{control-Z}), then it will get no work done while suspended.
|
|
|
|
Find out the process ID of the @code{xargs} process, either from your
|
|
shell or with the @code{ps} command. After you send it the signal
|
|
@code{SIGUSR2}, @code{xargs} will run one fewer command in parallel.
|
|
If you send it the signal @code{SIGUSR1}, it will run one more command
|
|
in parallel. For example:
|
|
|
|
@example
|
|
shell$ xargs <allimages -1 -P 4 makeallsizes &
|
|
[4] 27643
|
|
... at some later point ...
|
|
shell$ kill -USR2 27643
|
|
shell$ kill -USR2 %4
|
|
@end example
|
|
|
|
The first @code{kill} command will cause @code{xargs} to wait for
|
|
two commands to terminate before starting the next command (reducing
|
|
the parallelism from 4 to 3). The second @code{kill} will reduce it from
|
|
3 to 2. (@code{%4} works in some shells as a shorthand for the process
|
|
ID of the background job labeled @code{[4]}.)
|
|
|
|
Similarly, if you started a long @code{xargs} job without parallelism, you
|
|
can easily switch it to start running two commands in parallel by sending
|
|
it a @code{SIGUSR1}.
|
|
|
|
@code{xargs} will never terminate any existing commands when you ask it
|
|
to run fewer processes. It merely waits for the excess commands to
|
|
finish. If you ask it to run more commands, it will start the next
|
|
one immediately (if it has more work to do). If the degree of
|
|
parallelism is already 1, sending @code{SIGUSR2} will have no further
|
|
effect (since @code{--max-procs=0} means that there should be no limit
|
|
on the number of processes to run).
|
|
|
|
There is an implementation-defined limit on the number of processes.
|
|
This limit is shown with @code{xargs --show-limits}. The limit is at
|
|
least 127 on all systems (and on the author's system it is
|
|
2147483647).
|
|
|
|
If you send several identical signals quickly, the operating system
|
|
does not guarantee that each of them will be delivered to @code{xargs}.
|
|
This means that you can't rapidly increase or decrease the parallelism by
|
|
more than one command at a time. You can avoid this problem by sending
|
|
a signal, observing the result, then sending the next one; or merely by
|
|
delaying for a few seconds between signals (unless your system is very
|
|
heavily loaded).
|
|
|
|
Whether or not parallel execution will work well for you depends on
|
|
the nature of the commmand you are running in parallel, on the
|
|
configuration of the system on which you are running the command, and
|
|
on the other work being done on the system at the time.
|
|
|
|
@node Interspersing File Names
|
|
@subsubsection Interspersing File Names
|
|
|
|
@code{xargs} can insert the name of the file it is processing between
|
|
arguments you give for the command. Unless you also give options to
|
|
limit the command size (@pxref{Limiting Command Size}), this mode of
|
|
operation is equivalent to @samp{find -exec} (@pxref{Single File}).
|
|
|
|
@table @code
|
|
@item --replace@r{[}=@var{replace-str}@r{]}
|
|
@itemx -I @var{replace-str}
|
|
@itemx -i @var{replace-str}
|
|
Replace occurrences of @var{replace-str} in the initial arguments with
|
|
names read from the input. Also, unquoted blanks do not terminate
|
|
arguments; instead, the input is split at newlines only. For the
|
|
@samp{-i} option, if @var{replace-str} is omitted for @samp{--replace}
|
|
or @samp{-i}, it defaults to @samp{@{@}} (like for @samp{find -exec}).
|
|
Implies @samp{-x} and @samp{-l 1}. @samp{-i} is deprecated in favour
|
|
of @samp{-I}. As an example, to sort each file in the @file{bills}
|
|
directory, leaving the output in that file name with @file{.sorted}
|
|
appended, you could do:
|
|
|
|
@example
|
|
find bills -type f | xargs -I XX sort -o XX.sorted XX
|
|
@end example
|
|
|
|
@noindent
|
|
The equivalent command using @samp{find -execdir} is:
|
|
|
|
@example
|
|
find bills -type f -execdir sort -o '@{@}.sorted' '@{@}' ';'
|
|
@end example
|
|
@end table
|
|
|
|
|
|
When you use the @samp{-I} option, each line read from the input is
|
|
buffered internally. This means that there is an upper limit on the
|
|
length of input line that @code{xargs} will accept when used with the
|
|
@samp{-I} option. To work around this limitation, you can use the
|
|
@samp{-s} option to increase the amount of buffer space that xargs
|
|
uses, and you can also use an extra invocation of xargs to ensure that
|
|
very long lines do not occur. For example:
|
|
|
|
@example
|
|
somecommand | xargs -s 50000 echo | xargs -I '@{@}' -s 100000 rm '@{@}'
|
|
@end example
|
|
|
|
Here, the first invocation of @code{xargs} has no input line length
|
|
limit because it doesn't use the @samp{-I} option. The second
|
|
invocation of @code{xargs} does have such a limit, but we have ensured
|
|
that it never encounters a line which is longer than it can
|
|
handle.
|
|
|
|
This is not an ideal solution. Instead, the @samp{-I} option should
|
|
not impose a line length limit (apart from any limit imposed by the
|
|
operating system) and so one might consider this limitation to be a
|
|
bug. A better solution would be to allow @code{xargs -I} to
|
|
automatically move to a larger value for the @samp{-s} option when
|
|
this is needed.
|
|
|
|
This sort of problem doesn't occur with the output of @code{find}
|
|
because it emits just one filename per line.
|
|
|
|
@node Querying
|
|
@subsection Querying
|
|
|
|
To ask the user whether to execute a command on a single file, you can
|
|
use the @code{find} primary @samp{-okdir} instead of @samp{-execdir},
|
|
and the @code{find} primary @samp{-ok} instead of @samp{-exec}:
|
|
|
|
@deffn Action -okdir command ;
|
|
Like @samp{-execdir} (@pxref{Single File}), but ask the user first.
|
|
If the user does not agree to run the command, just return false.
|
|
Otherwise, run it, with standard input redirected from
|
|
@file{/dev/null}.
|
|
|
|
The response to the prompt is matched against a pair of regular
|
|
expressions to determine if it is a yes or no response. These regular
|
|
expressions are obtained from the system (@code{nl_langinfo} items
|
|
YESEXPR and NOEXPR are used) if the @code{POSIXLY_CORRECT} environment
|
|
variable is set and the system has such patterns available. Otherwise,
|
|
@code{find}'s message translations are used. In either case, the
|
|
@code{LC_MESSAGES} environment variable will determine the regular
|
|
expressions used to determine if the answer is affirmative or negative.
|
|
The interpretation of the regular expressions themselves will be
|
|
affected by the environment variables @code{LC_CTYPE} (character
|
|
classes) and @code{LC_COLLATE} (character ranges and equivalence
|
|
classes).
|
|
@end deffn
|
|
|
|
@deffn Action -ok command ;
|
|
This insecure variant of the @samp{-okdir} action is specified by
|
|
POSIX. The main difference is that the command is executed in the
|
|
directory from which @code{find} was invoked, meaning that @samp{@{@}}
|
|
is expanded to a relative path starting with the name of one of the
|
|
starting directories, rather than just the basename of the matched
|
|
file. If the command is run, its standard input is redirected from
|
|
@file{/dev/null}.
|
|
@end deffn
|
|
|
|
When processing multiple files with a single command, to query the
|
|
user you give @code{xargs} the following option. When using this
|
|
option, you might find it useful to control the number of files
|
|
processed per invocation of the command (@pxref{Limiting Command
|
|
Size}).
|
|
|
|
@table @code
|
|
@item --interactive
|
|
@itemx -p
|
|
Prompt the user about whether to run each command line and read a line
|
|
from the terminal. Only run the command line if the response starts
|
|
with @samp{y} or @samp{Y}. Implies @samp{-t}.
|
|
@end table
|
|
|
|
@node Delete Files
|
|
@section Delete Files
|
|
|
|
@deffn Action -delete
|
|
Delete files or directories; true if removal succeeded. If the
|
|
removal failed, an error message is issued.
|
|
|
|
The use of the @samp{-delete} action on the command line automatically
|
|
turns on the @samp{-depth} option (@pxref{find Expressions}). This
|
|
can be surprising if you were previously just testing with
|
|
@samp{-print}, so it is usually best to remember to use @samp{-depth}
|
|
explicitly.
|
|
|
|
If @samp{-delete} fails, @code{find}'s exit status will be nonzero
|
|
(when it eventually exits).
|
|
@end deffn
|
|
|
|
@node Adding Tests
|
|
@section Adding Tests
|
|
|
|
You can test for file attributes that none of the @code{find} builtin
|
|
tests check. To do this, use @code{xargs} to run a program that
|
|
filters a list of files printed by @code{find}. If possible, use
|
|
@code{find} builtin tests to pare down the list, so the program run by
|
|
@code{xargs} has less work to do. The tests builtin to @code{find}
|
|
will likely run faster than tests that other programs perform.
|
|
|
|
For reasons of efficiency it is often useful to limit the number of
|
|
times an external program has to be run. For this reason, it is often
|
|
a good idea to implement ``extended'' tests by using @code{xargs}.
|
|
|
|
For example, here is a way to print the names of all of the unstripped
|
|
binaries in the @file{/usr/local} directory tree. Builtin tests avoid
|
|
running @code{file} on files that are not regular files or are not
|
|
executable.
|
|
|
|
@example
|
|
find /usr/local -type f -perm /a=x | xargs file |
|
|
grep 'not stripped' | cut -d: -f1
|
|
@end example
|
|
|
|
@noindent
|
|
The @code{cut} program removes everything after the file name from the
|
|
output of @code{file}.
|
|
|
|
However, using @code{xargs} can present important security problems
|
|
(@pxref{Security Considerations}). These can be avoided by using
|
|
@samp{-execdir}. The @samp{-execdir} action is also a useful way of
|
|
putting your own test in the middle of a set of other tests or actions
|
|
for @code{find} (for example, you might want to use @samp{-prune}).
|
|
|
|
@c Idea from Martin Weitzel.
|
|
To place a special test somewhere in the middle of a @code{find}
|
|
expression, you can use @samp{-execdir} (or, less securely,
|
|
@samp{-exec}) to run a program that performs the test. Because
|
|
@samp{-execdir} evaluates to the exit status of the executed program,
|
|
you can use a program (which can be a shell script) that tests for a
|
|
special attribute and make it exit with a true (zero) or false
|
|
(non-zero) status. It is a good idea to place such a special test
|
|
@emph{after} the builtin tests, because it starts a new process which
|
|
could be avoided if a builtin test evaluates to false.
|
|
|
|
Here is a shell script called @code{unstripped} that checks whether
|
|
its argument is an unstripped binary file:
|
|
|
|
@example
|
|
#! /bin/sh
|
|
file "$1" | grep -q "not stripped"
|
|
@end example
|
|
|
|
|
|
This script relies on the shell exiting with the status of
|
|
the last command in the pipeline, in this case @code{grep}. The
|
|
@code{grep} command exits with a true status if it found any matches,
|
|
false if not. Here is an example of using the script (assuming it is
|
|
in your search path). It lists the stripped executables (and shell
|
|
scripts) in the file @file{sbins} and the unstripped ones in
|
|
@file{ubins}.
|
|
|
|
@example
|
|
find /usr/local -type f -perm /a=x \
|
|
\( -execdir unstripped '@{@}' \; -fprint ubins -o -fprint sbins \)
|
|
@end example
|
|
|
|
|
|
@node Databases
|
|
@chapter File Name Databases
|
|
|
|
The file name databases used by @code{locate} contain lists of files
|
|
that were in particular directory trees when the databases were last
|
|
updated. The file name of the default database is determined when
|
|
@code{locate} and @code{updatedb} are configured and installed. The
|
|
frequency with which the databases are updated and the directories for
|
|
which they contain entries depend on how often @code{updatedb} is run,
|
|
and with which arguments.
|
|
|
|
You can obtain some statistics about the databases by using
|
|
@samp{locate --statistics}.
|
|
|
|
@menu
|
|
* Database Locations::
|
|
* Database Formats::
|
|
* Newline Handling::
|
|
@end menu
|
|
|
|
|
|
@node Database Locations
|
|
@section Database Locations
|
|
|
|
There can be multiple file name databases. Users can select which
|
|
databases @code{locate} searches using the @code{LOCATE_PATH}
|
|
environment variable or a command line option. The system
|
|
administrator can choose the file name of the default database, the
|
|
frequency with which the databases are updated, and the directories
|
|
for which they contain entries. File name databases are updated by
|
|
running the @code{updatedb} program, typically nightly.
|
|
|
|
In networked environments, it often makes sense to build a database at
|
|
the root of each filesystem, containing the entries for that
|
|
filesystem. @code{updatedb} is then run for each filesystem on the
|
|
fileserver where that filesystem is on a local disk, to prevent
|
|
thrashing the network.
|
|
|
|
@xref{Invoking updatedb}, for the description of the options to
|
|
@code{updatedb}. These options can be used to specify which
|
|
directories are indexed by each database file.
|
|
|
|
The default location for the locate database depends on how findutils
|
|
is built, but the findutils installation accompanying this manual uses
|
|
the default location @file{@value{LOCATE_DB}}.
|
|
|
|
If no database exists at @file{@value{LOCATE_DB}} but the user did not
|
|
specify where to look (by using @samp{-d} or setting
|
|
@code{LOCATE_PATH}), then @code{locate} will also check for a
|
|
``secure'' database in @file{/var/lib/slocate/slocate.db}.
|
|
|
|
@node Database Formats
|
|
@section Database Formats
|
|
|
|
The file name databases contain lists of files that were in particular
|
|
directory trees when the databases were last updated. The file name
|
|
database format changed starting with GNU @code{locate} version 4.0 to
|
|
allow machines with different byte orderings to share the databases.
|
|
|
|
GNU @code{locate} can read both the old and new database formats.
|
|
However, old versions of @code{locate} (on other Unix systems, or GNU
|
|
@code{locate} before version 4.0) produce incorrect results if run
|
|
against a database in something other than the old format.
|
|
|
|
Support for the old database format will eventually be discontinued,
|
|
first in @code{updatedb} and later in @code{locate}.
|
|
|
|
If you run @samp{locate --statistics}, the resulting summary indicates
|
|
the type of each @code{locate} database. You select which database
|
|
format @code{updatedb} will use with the @samp{--dbformat} option.
|
|
|
|
|
|
@menu
|
|
* LOCATE02 Database Format::
|
|
* Sample LOCATE02 Database::
|
|
* slocate Database Format::
|
|
* Old Database Format::
|
|
@end menu
|
|
|
|
@node LOCATE02 Database Format
|
|
@subsection LOCATE02 Database Format
|
|
|
|
@code{updatedb} runs a program called @code{frcode} to
|
|
@dfn{front-compress} the list of file names, which reduces the
|
|
database size by a factor of 4 to 5. Front-compression (also known as
|
|
incremental encoding) works as follows.
|
|
|
|
The database entries are a sorted list (case-insensitively, for users'
|
|
convenience). Since the list is sorted, each entry is likely to share
|
|
a prefix (initial string) with the previous entry. Each database
|
|
entry begins with an offset-differential count byte, which is the
|
|
additional number of characters of prefix of the preceding entry to
|
|
use beyond the number that the preceding entry is using of its
|
|
predecessor. (The counts can be negative.) Following the count is a
|
|
null-terminated ASCII remainder -- the part of the name that follows
|
|
the shared prefix.
|
|
|
|
If the offset-differential count is larger than can be stored in a
|
|
byte (+/-127), the byte has the value 0x80 and the count follows in a
|
|
2-byte word, with the high byte first (network byte order).
|
|
|
|
Every database begins with a dummy entry for a file called
|
|
@file{LOCATE02}, which @code{locate} checks for to ensure that the
|
|
database file has the correct format; it ignores the entry in doing
|
|
the search.
|
|
|
|
Databases cannot be concatenated together, even if the first (dummy)
|
|
entry is trimmed from all but the first database. This is because the
|
|
offset-differential count in the first entry of the second and
|
|
following databases will be wrong.
|
|
|
|
In the output of @samp{locate --statistics}, the new database format
|
|
is referred to as @samp{LOCATE02}.
|
|
|
|
@node Sample LOCATE02 Database
|
|
@subsection Sample LOCATE02 Database
|
|
|
|
Sample input to @code{frcode}:
|
|
@c with nulls changed to newlines:
|
|
|
|
@example
|
|
/usr/src
|
|
/usr/src/cmd/aardvark.c
|
|
/usr/src/cmd/armadillo.c
|
|
/usr/tmp/zoo
|
|
@end example
|
|
|
|
Length of the longest prefix of the preceding entry to share:
|
|
|
|
@example
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
14 rmadillo.c
|
|
5 tmp/zoo
|
|
@end example
|
|
|
|
Output from @code{frcode}, with trailing nulls changed to newlines
|
|
and count bytes made printable:
|
|
|
|
@example
|
|
0 LOCATE02
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
6 rmadillo.c
|
|
-9 tmp/zoo
|
|
@end example
|
|
|
|
(6 = 14 - 8, and -9 = 5 - 14)
|
|
|
|
@node slocate Database Format
|
|
@subsection slocate Database Format
|
|
|
|
The @code{slocate} program uses a database format similar to, but not
|
|
quite the same as, GNU @code{locate}. The first byte of the database
|
|
specifies its @dfn{security level}. If the security level is 0,
|
|
@code{slocate} will read, match and print filenames on the basis of
|
|
the information in the database only. However, if the security level
|
|
byte is 1, @code{slocate} omits entries from its output if the
|
|
invoking user is unable to access them. The second byte of the
|
|
database is zero. The second byte is immediately followed by the
|
|
first database entry. The first entry in the database is not preceded
|
|
by any differential count or dummy entry. Instead the differential
|
|
count for the first item is assumed to be zero.
|
|
|
|
Starting with the second entry (if any) in the database, data is
|
|
interpreted as for the GNU LOCATE02 format.
|
|
|
|
@node Old Database Format
|
|
@subsection Old Database Format
|
|
|
|
The old database format is used by Unix @code{locate} and @code{find}
|
|
programs and earlier releases of the GNU ones. @code{updatedb}
|
|
produces this format if given the @samp{--old-format} option.
|
|
|
|
@code{updatedb} runs programs called @code{bigram} and @code{code} to
|
|
produce old-format databases. The old format differs from the new one
|
|
in the following ways. Instead of each entry starting with an
|
|
offset-differential count byte and ending with a null, byte values
|
|
from 0 through 28 indicate offset-differential counts from -14 through
|
|
14. The byte value indicating that a long offset-differential count
|
|
follows is 0x1e (30), not 0x80. The long counts are stored in host
|
|
byte order, which is not necessarily network byte order, and host
|
|
integer word size, which is usually 4 bytes. They also represent a
|
|
count 14 less than their value. The database lines have no
|
|
termination byte; the start of the next line is indicated by its first
|
|
byte having a value <= 30.
|
|
|
|
In addition, instead of starting with a dummy entry, the old database
|
|
format starts with a 256 byte table containing the 128 most common
|
|
bigrams in the file list. A bigram is a pair of adjacent bytes.
|
|
Bytes in the database that have the high bit set are indexes (with the
|
|
high bit cleared) into the bigram table. The bigram and
|
|
offset-differential count coding makes these databases 20-25% smaller
|
|
than the new format, but makes them not 8-bit clean. Any byte in a
|
|
file name that is in the ranges used for the special codes is replaced
|
|
in the database by a question mark, which not coincidentally is the
|
|
shell wildcard to match a single character.
|
|
|
|
The old format therefore cannot faithfully store entries with
|
|
non-ASCII characters. It therefore should not be used in
|
|
internationalised environments. That is, most installations should
|
|
not use it.
|
|
|
|
Because the long counts are stored by the @code{code} program as
|
|
native-order machine words, the database format is not easily used in
|
|
environments which differ in terms of byte order. If locate databases
|
|
are to be shared between machines, the LOCATE02 database format should
|
|
be used. This has other benefits as discussed above. However, the
|
|
length of the filename currently being processed can normally be used
|
|
to place reasonable limits on the long counts and so this information
|
|
is used by locate to help it guess the byte ordering of the old format
|
|
database. Unless it finds evidence to the contrary, @code{locate}
|
|
will assume that the byte order of the database is the same as the
|
|
native byte order of the machine running @code{locate}. The output of
|
|
@samp{locate --statistics} also includes information about the byte
|
|
order of old-format databases.
|
|
|
|
The output of @samp{locate --statistics} will give an incorrect count
|
|
of the number of file names containing newlines or high-bit characters
|
|
for old-format databases.
|
|
|
|
Old versions of GNU @code{locate} fail to correctly handle very long
|
|
file names, possibly leading to security problems relating to a heap
|
|
buffer overrun. @xref{Security Considerations for locate}, for a
|
|
detailed explanation.
|
|
|
|
@node Newline Handling
|
|
@section Newline Handling
|
|
|
|
Within the database, file names are terminated with a null character.
|
|
This is the case for both the old and the new format.
|
|
|
|
When the new database format is being used, the compression technique
|
|
used to generate the database though relies on the ability to sort the
|
|
list of files before they are presented to @code{frcode}.
|
|
|
|
If the system's sort command allows its input list of files to be
|
|
separated with null characters via the @samp{-z} option, this option
|
|
is used and therefore @code{updatedb} and @code{locate} will both
|
|
correctly handle file names containing newlines. If the @code{sort}
|
|
command lacks support for this, the list of files is delimited with
|
|
the newline character, meaning that parts of file names containing
|
|
newlines will be incorrectly sorted. This can result in both
|
|
incorrect matches and incorrect failures to match.
|
|
|
|
On the other hand, if you are using the old database format, file
|
|
names with embedded newlines are not correctly handled. There is no
|
|
technical limitation which enforces this, it's just that the
|
|
@code{bigram} program has not been updated to support lists of file
|
|
names separated by nulls.
|
|
|
|
So, if you are using the new database format (this is the default) and
|
|
your system uses GNU @code{sort}, newlines will be correctly handled
|
|
at all times. Otherwise, newlines may not be correctly handled.
|
|
|
|
@node File Permissions
|
|
@chapter File Permissions
|
|
|
|
@include perm.texi
|
|
|
|
@include parse-datetime.texi
|
|
|
|
@node Configuration
|
|
@chapter Configuration
|
|
|
|
The findutils source distribution includes a @code{configure} script
|
|
which examines the system and generates files required to build
|
|
findutils. See the files @file{README} and @file{INSTALL}.
|
|
|
|
A number of options can be specified on the @code{configure} command
|
|
line, and many of these are straightforward, adequately documented in
|
|
the @code{--help} output, or not normally useful. Options which are
|
|
useful or which are not obvious are explained here.
|
|
|
|
@menu
|
|
* Leaf Optimisation:: Take advantage of Unix file system semantics.
|
|
* d_type Optimisation:: Take advantage of file type information.
|
|
* fts:: A non-recursive file system search.
|
|
@end menu
|
|
|
|
@node Leaf Optimisation
|
|
@section Leaf Optimisation
|
|
|
|
Files in Unix file systems have a link count which indicates how many
|
|
names point to the same inode. Directories in Unix filssytems have a
|
|
@file{..} entry which functions as a hard link to the parent directory
|
|
and a @file{.} entry which functions as a link to the directory itself.
|
|
The @file{..} entry of the root directory also points to the root.
|
|
This means that @code{find} can deduce the number of subdirectories a
|
|
directory has, simply by subtracting 2 from the directory's link
|
|
count. This allows @file{find} the calls to @code{stat} which would
|
|
otherwise be needed to discover which directory entries are
|
|
subdirectories.
|
|
|
|
File systems which don't have these semantics should simply return a
|
|
value less than 2 in the @code{st_nlinks} member of @code{struct stat}
|
|
in response to a successful call to @code{stat}.
|
|
|
|
If you are building @code{find} for a system on which the value of
|
|
@code{st_nlinks} is unreliable, you can specify
|
|
@code{--disable-leaf-optimisation} to @code{configure} to prevent this
|
|
assumption being made.
|
|
|
|
@node d_type Optimisation
|
|
@section d_type Optimisation
|
|
|
|
When this feature is enabled, @code{find} takes advantage of the fact
|
|
that on some systems @code{readdir} will return the type of a file in
|
|
@code{struct dirent}.
|
|
|
|
@node fts
|
|
@section fts
|
|
|
|
The findutils source distribution contains two different implementations of
|
|
@code{find}. The older implementation descends the file system
|
|
recursively, while the newer one uses @code{fts}. Both are normally
|
|
installed.
|
|
|
|
If the option @code{--without-fts} was passed to @code{configure}, the
|
|
recursive implementation is installed as @code{find} and the fts-based
|
|
implementation is installed as @code{ftsfind}. Otherwise, the
|
|
fts-based implementation is installed as @code{find} and the recursive
|
|
implementation is installed as @code{oldfind}.
|
|
|
|
|
|
|
|
@node Reference
|
|
@chapter Reference
|
|
|
|
Below are summaries of the command line syntax for the programs
|
|
discussed in this manual.
|
|
|
|
@menu
|
|
* Invoking find::
|
|
* Invoking locate::
|
|
* Invoking updatedb::
|
|
* Invoking xargs::
|
|
* Regular Expressions::
|
|
* Environment Variables::
|
|
@end menu
|
|
|
|
@node Invoking find
|
|
@section Invoking @code{find}
|
|
|
|
@example
|
|
find @r{[-H] [-L] [-P] [-D @var{debugoptions}] [-O@var{level}]} @r{[}@var{file}@dots{}@r{]} @r{[}@var{expression}@r{]}
|
|
@end example
|
|
|
|
@code{find} searches the directory tree rooted at each file name
|
|
@var{file} by evaluating the @var{expression} on each file it finds in
|
|
the tree.
|
|
|
|
The command line may begin with the @samp{-H}, @samp{-L}, @samp{-P},
|
|
@samp{-D} and @samp{-O} options. These are followed by a list of
|
|
files or directories that should be searched. If no files to search
|
|
are specified, the current directory (@file{.}) is used.
|
|
|
|
This list of files to search is followed by a list of expressions
|
|
describing the files we wish to search for. The first part of the
|
|
expression is recognised by the fact that it begins with @samp{-}
|
|
followed by some other letters (for example @samp{-print}), or is
|
|
either @samp{(} or @samp{!}. Any arguments after it are the rest of
|
|
the expression.
|
|
|
|
If no expression is given, the expression @samp{-print} is used.
|
|
|
|
The @code{find} command exits with status zero if all files matched
|
|
are processed successfully, greater than zero if errors occur.
|
|
|
|
The @code{find} program also recognises two options for administrative
|
|
use:
|
|
|
|
@table @samp
|
|
@item --help
|
|
Print a summary of the command line usage and exit.
|
|
@item --version
|
|
Print the version number of @code{find} and exit.
|
|
@end table
|
|
|
|
The @samp{-version} option is a synonym for @samp{--version}
|
|
|
|
|
|
@menu
|
|
* Filesystem Traversal Options::
|
|
* Warning Messages::
|
|
* Optimisation Options::
|
|
* Debug Options::
|
|
* Find Expressions::
|
|
@end menu
|
|
|
|
@node Filesystem Traversal Options
|
|
@subsection Filesystem Traversal Options
|
|
|
|
The options @samp{-H}, @samp{-L} or @samp{-P} may be specified at the
|
|
start of the command line (if none of these is specified, @samp{-P} is
|
|
assumed). If you specify more than one of these options, the last one
|
|
specified takes effect (but note that the @samp{-follow} option is
|
|
equivalent to @samp{-L}).
|
|
|
|
@table @code
|
|
@item -P
|
|
Never follow symbolic links (this is the default), except in the case
|
|
of the @samp{-xtype} predicate.
|
|
@item -L
|
|
Always follow symbolic links, except in the case of the @samp{-xtype}
|
|
predicate.
|
|
@item -H
|
|
Follow symbolic links specified in the list of files to search, or
|
|
which are otherwise specified on the command line.
|
|
@end table
|
|
|
|
If @code{find} would follow a symbolic link, but cannot for any reason
|
|
(for example, because it has insufficient permissions or the link is
|
|
broken), it falls back on using the properties of the symbolic link
|
|
itself. @ref{Symbolic Links} for a more complete description of how
|
|
symbolic links are handled.
|
|
|
|
@node Warning Messages
|
|
@subsection Warning Messages
|
|
|
|
If there is an error on the @code{find} command line, an error message
|
|
is normally issued. However, there are some usages that are
|
|
inadvisable but which @code{find} should still accept. Under these
|
|
circumstances, @code{find} may issue a warning message.
|
|
|
|
By default, warnings are enabled only if @code{find} is being run
|
|
interactively (specifically, if the standard input is a terminal) and
|
|
the @code{POSIXLY_CORRECT} environment variable is not set. Warning
|
|
messages can be controlled explicitly by the use of options on the
|
|
command line:
|
|
|
|
@table @code
|
|
@item -warn
|
|
Issue warning messages where appropriate.
|
|
@item -nowarn
|
|
Do not issue warning messages.
|
|
@end table
|
|
|
|
These options take effect at the point on the command line where they
|
|
are specified. Therefore it's not useful to specify @samp{-nowarn} at
|
|
the end of the command line. The warning messages affected by the
|
|
above options are triggered by:
|
|
|
|
@itemize @minus
|
|
@item
|
|
Use of the @samp{-d} option which is deprecated; please use
|
|
@samp{-depth} instead, since the latter is POSIX-compliant.
|
|
@item
|
|
Use of the @samp{-ipath} option which is deprecated; please use
|
|
@samp{-iwholename} instead.
|
|
@item
|
|
Specifying an option (for example @samp{-mindepth}) after a non-option
|
|
(for example @samp{-type} or @samp{-print}) on the command line.
|
|
@item
|
|
Use of the @samp{-name} or @samp{-iname} option with a slash character
|
|
in the pattern. Since the name predicates only compare against the
|
|
basename of the visited files, the only file that can match a slash is
|
|
the root directory itself.
|
|
@end itemize
|
|
|
|
The default behaviour above is designed to work in that way so that
|
|
existing shell scripts don't generate spurious errors, but people will
|
|
be made aware of the problem.
|
|
|
|
Some warning messages are issued for less common or more serious
|
|
problems, and consequently cannot be turned off:
|
|
|
|
@itemize @minus
|
|
@item
|
|
Use of an unrecognised backslash escape sequence with @samp{-fprintf}
|
|
@item
|
|
Use of an unrecognised formatting directive with @samp{-fprintf}
|
|
@end itemize
|
|
|
|
@node Optimisation Options
|
|
@subsection Optimisation Options
|
|
|
|
The @samp{-O@var{level}} option sets @code{find}'s optimisation level
|
|
to @var{level}. The default optimisation level is 1.
|
|
|
|
At certain optimisation levels, @code{find} reorders tests to speed up
|
|
execution while preserving the overall effect; that is, predicates
|
|
with side effects are not reordered relative to each other. The
|
|
optimisations performed at each optimisation level are as follows.
|
|
|
|
@table @samp
|
|
@item 0
|
|
Currently equivalent to optimisation level 1.
|
|
|
|
@item 1
|
|
This is the default optimisation level and corresponds to the
|
|
traditional behaviour. Expressions are reordered so that tests based
|
|
only on the names of files (for example@samp{ -name} and
|
|
@samp{-regex}) are performed first.
|
|
|
|
@item 2
|
|
Any @samp{-type} or @samp{-xtype} tests are performed after any tests
|
|
based only on the names of files, but before any tests that require
|
|
information from the inode. On many modern versions of Unix, file
|
|
types are returned by @code{readdir()} and so these predicates are
|
|
faster to evaluate than predicates which need to stat the file first.
|
|
|
|
If you use the @samp{-fstype FOO} predicate and specify a filsystem
|
|
type @samp{FOO} which is not known (that is, present in
|
|
@file{/etc/mtab}) at the time @code{find} starts, that predicate is
|
|
equivalent to @samp{-false}.
|
|
|
|
|
|
@item 3
|
|
At this optimisation level, the full cost-based query optimiser is
|
|
enabled. The order of tests is modified so that cheap (i.e., fast)
|
|
tests are performed first and more expensive ones are performed later,
|
|
if necessary. Within each cost band, predicates are evaluated earlier
|
|
or later according to whether they are likely to succeed or not. For
|
|
@samp{-o}, predicates which are likely to succeed are evaluated
|
|
earlier, and for @samp{-a}, predicates which are likely to fail are
|
|
evaluated earlier.
|
|
@end table
|
|
|
|
|
|
@node Debug Options
|
|
@subsection Debug Options
|
|
|
|
The @samp{-D} option makes @code{find} produce diagnostic output.
|
|
Much of the information is useful only for diagnosing problems, and so
|
|
most people will not find this option helpful.
|
|
|
|
The list of debug options should be comma separated. Compatibility of
|
|
the debug options is not guaranteed between releases of findutils.
|
|
For a complete list of valid debug options, see the output of
|
|
@code{find -D help}. Valid debug options include:
|
|
@table @samp
|
|
@item help
|
|
Explain the debugging options.
|
|
@item tree
|
|
Show the expression tree in its original and optimised form.
|
|
@item stat
|
|
Print messages as files are examined with the stat and lstat system
|
|
calls. The find program tries to minimise such calls.
|
|
@item opt
|
|
Prints diagnostic information relating to the optimisation of the
|
|
expression tree; see the @samp{-O} option.
|
|
@item rates
|
|
Prints a summary indicating how often each predicate succeeded or
|
|
failed.
|
|
@end table
|
|
|
|
@node Find Expressions
|
|
@subsection Find Expressions
|
|
|
|
The final part of the @code{find} command line is a list of
|
|
expressions. @xref{Primary Index}, for a summary of all of the tests,
|
|
actions, and options that the expression can contain. If the
|
|
expression is missing, @samp{-print} is assumed.
|
|
|
|
@node Invoking locate
|
|
@section Invoking @code{locate}
|
|
|
|
@example
|
|
locate @r{[}@var{option}@dots{}@r{]} @var{pattern}@dots{}
|
|
@end example
|
|
|
|
For each @var{pattern} given @code{locate} searches one or more file
|
|
name databases returning each match of @var{pattern}.
|
|
|
|
@table @code
|
|
@item --all
|
|
@itemx -A
|
|
Print only names which match all non-option arguments, not those
|
|
matching one or more non-option arguments.
|
|
|
|
@item --basename
|
|
@itemx -b
|
|
The specified pattern is matched against just the last component of
|
|
the name of a file in the @code{locate} database. This last
|
|
component is also called the ``base name''. For example, the base
|
|
name of @file{/tmp/mystuff/foo.old.c} is @file{foo.old.c}. If the
|
|
pattern contains metacharacters, it must match the base name exactly.
|
|
If not, it must match part of the base name.
|
|
|
|
@item --count
|
|
@itemx -c
|
|
Instead of printing the matched file names, just print the total
|
|
number of matches found, unless @samp{--print} (@samp{-p}) is also
|
|
present.
|
|
|
|
|
|
@item --database=@var{path}
|
|
@itemx -d @var{path}
|
|
Instead of searching the default @code{locate} database
|
|
@file{@value{LOCATE_DB}}, @code{locate} searches the file
|
|
name databases in @var{path}, which is a colon-separated list of
|
|
database file names. You can also use the environment variable
|
|
@code{LOCATE_PATH} to set the list of database files to search. The
|
|
option overrides the environment variable if both are used. Empty
|
|
elements in @var{path} (that is, a leading or trailing colon, or two
|
|
colons in a row) are taken to stand for the default database.
|
|
A database can be supplied on stdin, using @samp{-} as an element
|
|
of @samp{path}. If more than one element of @samp{path} is @samp{-},
|
|
later instances are ignored (but a warning message is printed).
|
|
|
|
@item --existing
|
|
@itemx -e
|
|
Only print out such names which currently exist (instead of such names
|
|
which existed when the database was created). Note that this may slow
|
|
down the program a lot, if there are many matches in the database.
|
|
The way in which broken symbolic links are treated is affected by the
|
|
@samp{-L}, @samp{-P} and @samp{-H} options. Please note that it is
|
|
possible for the file to be deleted after @code{locate} has checked
|
|
that it exists, but before you use it. This option is automatically
|
|
turned on when reading an @code{slocate} database in secure mode
|
|
(@pxref{slocate Database Format}).
|
|
|
|
@item --non-existing
|
|
@itemx -E
|
|
Only print out such names which currently do not exist (instead of
|
|
such names which existed when the database was created). Note that
|
|
this may slow down the program a lot, if there are many matches in the
|
|
database. The way in which broken symbolic links are treated is
|
|
affected by the @samp{-L}, @samp{-P} and @samp{-H} options. Please
|
|
note that @code{locate} checks that the file does not exist, but a
|
|
file of the same name might be created after @code{locate}'s check but
|
|
before you read @code{locate}'s output.
|
|
|
|
@item --follow
|
|
@itemx -L
|
|
If testing for the existence of files (with the @samp{-e} or @samp{-E}
|
|
options), consider broken symbolic links to be non-existing. This is
|
|
the default behaviour.
|
|
|
|
@item --nofollow
|
|
@itemx -P
|
|
@itemx -H
|
|
If testing for the existence of files (with the @samp{-e} or @samp{-E}
|
|
options), treat broken symbolic links as if they were existing files.
|
|
The @samp{-H} form of this option is provided purely for similarity
|
|
with @code{find}; the use of @samp{-P} is recommended over @samp{-H}.
|
|
|
|
@item --ignore-case
|
|
@itemx -i
|
|
Ignore case distinctions in both the pattern and the file names.
|
|
|
|
@item --limit=N
|
|
@itemx -l N
|
|
Limit the number of results printed to N. When used with the
|
|
@samp{--count} option, the value printed will never be larger than
|
|
this limit.
|
|
@item --max-database-age=D
|
|
Normally, @code{locate} will issue a warning message when it searches
|
|
a database which is more than 8 days old. This option changes that
|
|
value to something other than 8. The effect of specifying a negative
|
|
value is undefined.
|
|
@item --mmap
|
|
@itemx -m
|
|
Accepted but does nothing. The option is supported only to provide
|
|
compatibility with BSD's @code{locate}.
|
|
|
|
@item --null
|
|
@itemx -0
|
|
Results are separated with the ASCII NUL character rather than the
|
|
newline character. To get the full benefit of this option,
|
|
use the new @code{locate} database format (that is the default
|
|
anyway).
|
|
|
|
@item --print
|
|
@itemx -p
|
|
Print search results when they normally would not be due to
|
|
use of @samp{--statistics} (@samp{-S}) or @samp{--count}
|
|
(@samp{-c}).
|
|
|
|
@item --wholename
|
|
@itemx -w
|
|
The specified pattern is matched against the whole name of the file in
|
|
the @code{locate} database. If the pattern contains metacharacters,
|
|
it must match exactly. If not, it must match part of the whole file
|
|
name. This is the default behaviour.
|
|
|
|
@item --regex
|
|
@itemx -r
|
|
Instead of using substring or shell glob matching, the pattern
|
|
specified on the command line is understood to be a regular
|
|
expression. GNU Emacs-style regular expressions are assumed unless
|
|
the @samp{--regextype} option is also given. File names from the
|
|
@code{locate} database are matched using the specified regular
|
|
expression. If the @samp{-i} flag is also given, matching is
|
|
case-insensitive. Matches are performed against the whole path name,
|
|
and so by default a pathname will be matched if any part of it matches
|
|
the specified regular expression. The regular expression may use
|
|
@samp{^} or @samp{$} to anchor a match at the beginning or end of a
|
|
pathname.
|
|
|
|
@item --regextype
|
|
This option changes the regular expression syntax and behaviour used
|
|
by the @samp{--regex} option. @ref{Regular Expressions} for more
|
|
information on the regular expression dialects understood by GNU
|
|
findutils.
|
|
|
|
@item --stdio
|
|
@itemx -s
|
|
Accepted but does nothing. The option is supported only to provide
|
|
compatibility with BSD's @code{locate}.
|
|
|
|
@item --statistics
|
|
@itemx -S
|
|
Print some summary information for each @code{locate} database. No
|
|
search is performed unless non-option arguments are given.
|
|
Although the BSD version of locate also has this option, the format of the
|
|
output is different.
|
|
|
|
@item --help
|
|
Print a summary of the command line usage for @code{locate} and exit.
|
|
|
|
@item --version
|
|
Print the version number of @code{locate} and exit.
|
|
@end table
|
|
|
|
@node Invoking updatedb
|
|
@section Invoking @code{updatedb}
|
|
|
|
@example
|
|
updatedb @r{[}@var{option}@dots{}@r{]}
|
|
@end example
|
|
|
|
@code{updatedb} creates and updates the database of file names used by
|
|
@code{locate}. @code{updatedb} generates a list of files similar to
|
|
the output of @code{find} and then uses utilities for optimizing the
|
|
database for performance. @code{updatedb} is often run periodically
|
|
as a @code{cron} job and configured with environment variables or
|
|
command options. Typically, operating systems have a shell script
|
|
that ``exports'' configurations for variable definitions and uses
|
|
another shell script that ``sources'' the configuration file into the
|
|
environment and then executes @code{updatedb} in the environment.
|
|
|
|
@table @code
|
|
@item --findoptions='@var{OPTION}@dots{}'
|
|
Global options to pass on to @code{find}.
|
|
The environment variable @code{FINDOPTIONS} also sets this value.
|
|
Default is none.
|
|
|
|
@item --localpaths='@var{path}@dots{}'
|
|
Non-network directories to put in the database.
|
|
Default is @file{/}.
|
|
|
|
@item --netpaths='@var{path}@dots{}'
|
|
Network (NFS, AFS, RFS, etc.) directories to put in the database.
|
|
The environment variable @code{NETPATHS} also sets this value.
|
|
Default is none.
|
|
|
|
@item --prunepaths='@var{path}@dots{}'
|
|
Directories to omit from the database, which would otherwise be
|
|
included. The environment variable @code{PRUNEPATHS} also sets this
|
|
value. Default is @file{/tmp /usr/tmp /var/tmp /afs}. The paths are
|
|
used as regular expressions (with @code{find ... -regex}, so you need
|
|
to specify these paths in the same way that @code{find} will encounter
|
|
them. This means for example that the paths must not include trailing
|
|
slashes.
|
|
|
|
@item --prunefs='@var{path}@dots{}'
|
|
Filesystems to omit from the database, which would otherwise be
|
|
included. Note that files are pruned when a filesystem is reached;
|
|
Any filesystem mounted under an undesired filesystem will be ignored.
|
|
The environment variable @code{PRUNEFS} also sets this value. Default
|
|
is @file{nfs NFS proc}.
|
|
|
|
@item --output=@var{dbfile}
|
|
The database file to build. The default is system-dependent, but
|
|
when this document was formatted it was @file{@value{LOCATE_DB}}.
|
|
|
|
@item --localuser=@var{user}
|
|
The user to search the non-network directories as, using @code{su}.
|
|
Default is to search the non-network directories as the current user.
|
|
You can also use the environment variable @code{LOCALUSER} to set this user.
|
|
|
|
@item --netuser=@var{user}
|
|
The user to search network directories as, using @code{su}. Default
|
|
@code{user} is @code{daemon}. You can also use the environment variable
|
|
@code{NETUSER} to set this user.
|
|
|
|
@item --old-format
|
|
Generate a @code{locate} database in the old format, for compatibility
|
|
with versions of @code{locate} other than GNU @code{locate}. Using
|
|
this option means that @code{locate} will not be able to properly
|
|
handle non-ASCII characters in file names (that is, file names
|
|
containing characters which have the eighth bit set, such as many of
|
|
the characters from the ISO-8859-1 character set). @xref{Database
|
|
Formats}, for a detailed description of the supported database
|
|
formats.
|
|
|
|
@item --dbformat=@var{FORMAT}
|
|
Generate the locate database in format @code{FORMAT}. Supported
|
|
database formats include @code{LOCATE02} (which is the default),
|
|
@code{old} and @code{slocate}. The @code{old} format exists for
|
|
compatibility with implementations of @code{locate} on other Unix
|
|
systems. The @code{slocate} format exists for compatibility with
|
|
@code{slocate}. @xref{Database Formats}, for a detailed description
|
|
of each format.
|
|
|
|
@item --help
|
|
Print a summary of the command line usage and exit.
|
|
@item --version
|
|
Print the version number of @code{updatedb} and exit.
|
|
@end table
|
|
|
|
@node Invoking xargs
|
|
@section Invoking @code{xargs}
|
|
|
|
@example
|
|
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
|
@end example
|
|
|
|
@code{xargs} exits with the following status:
|
|
|
|
@table @asis
|
|
@item 0
|
|
if it succeeds
|
|
@item 123
|
|
if any invocation of the command exited with status 1-125
|
|
@item 124
|
|
if the command exited with status 255
|
|
@item 125
|
|
if the command is killed by a signal
|
|
@item 126
|
|
if the command cannot be run
|
|
@item 127
|
|
if the command is not found
|
|
@item 1
|
|
if some other error occurred.
|
|
@end table
|
|
|
|
Exit codes greater than 128 are used by the shell to indicate that
|
|
a program died due to a fatal signal.
|
|
|
|
|
|
@menu
|
|
* xargs options::
|
|
* Invoking the shell from xargs::
|
|
@end menu
|
|
|
|
@node xargs options
|
|
@subsection xargs options
|
|
|
|
@table @code
|
|
@item --arg-file@r{=@var{inputfile}}
|
|
@itemx -a @r{@var{inputfile}}
|
|
Read names from the file @var{inputfile} instead of standard input.
|
|
If you use this option, the standard input stream remains unchanged
|
|
when commands are run. Otherwise, stdin is redirected from
|
|
@file{/dev/null}.
|
|
|
|
@item --null
|
|
@itemx -0
|
|
Input file names are terminated by a null character instead of by
|
|
whitespace, and any quotes and backslash characters are not considered
|
|
special (every character is taken literally). Disables the end of
|
|
file string, which is treated like any other argument.
|
|
|
|
@item --delimiter @var{delim}
|
|
@itemx -d @var{delim}
|
|
|
|
Input file names are terminated by the specified character @var{delim}
|
|
instead of by whitespace, and any quotes and backslash characters are
|
|
not considered special (every character is taken literally). Disables
|
|
the logical end of file marker string, which is treated like any other
|
|
argument.
|
|
|
|
The specified delimiter may be a single character, a C-style character
|
|
escape such as @samp{\n}, or an octal or hexadecimal escape code.
|
|
Octal and hexadecimal escape codes are understood as for the
|
|
@code{printf} command. Multibyte characters are not supported.
|
|
|
|
@item -E @var{eof-str}
|
|
@itemx --eof@r{[}=@var{eof-str}@r{]}
|
|
@itemx -e@r{[}@var{eof-str}@r{]}
|
|
|
|
Set the logical end of file marker string to @var{eof-str}. If the
|
|
logical end of file marker string occurs as a line of input, the rest of
|
|
the input is ignored. If @var{eof-str} is omitted (@samp{-e}) or blank
|
|
(either @samp{-e} or @samp{-E}), there is no logical end of file marker
|
|
string. The @samp{-e} form of this option is deprecated in favour of
|
|
the POSIX-compliant @samp{-E} option, which you should use instead. As
|
|
of GNU @code{xargs} version 4.2.9, the default behaviour of @code{xargs}
|
|
is not to have a logical end of file marker string. The POSIX standard
|
|
(IEEE Std 1003.1, 2004 Edition) allows this.
|
|
|
|
The logical end of file marker string is not treated specially if the
|
|
@samp{-d} or the @samp{-0} options are in effect. That is, when either
|
|
of these options are in effect, the whole input file will be read even
|
|
if @samp{-E} was used.
|
|
|
|
@item --help
|
|
Print a summary of the options to @code{xargs} and exit.
|
|
|
|
@item -I @var{replace-str}
|
|
@itemx --replace@r{[}=@var{replace-str}@r{]}
|
|
@itemx -i@r{[}@var{replace-str}@r{]}
|
|
Replace occurrences of @var{replace-str} in the initial arguments with
|
|
names read from standard input. Also, unquoted blanks do not
|
|
terminate arguments; instead, the input is split at newlines only. If
|
|
@var{replace-str} is omitted (omitting it is allowed only for
|
|
@samp{-i}), it defaults to @samp{@{@}} (like for @samp{find -exec}).
|
|
Implies @samp{-x} and @samp{-l 1}. The @samp{-i} option is deprecated
|
|
in favour of the @samp{-I} option.
|
|
|
|
@item -L @var{max-lines}
|
|
@itemx --max-lines@r{[}=@var{max-lines}@r{]}
|
|
@itemx -l@r{[}@var{max-lines}@r{]}
|
|
Use at most @var{max-lines} non-blank input lines per command line.
|
|
For @samp{-l}, @var{max-lines} defaults to 1 if omitted. For
|
|
@samp{-L}, the argument is mandatory. Trailing blanks cause an input
|
|
line to be logically continued on the next input line, for the purpose
|
|
of counting the lines. Implies @samp{-x}. The @samp{-l} form of this
|
|
option is deprecated in favour of the POSIX-compliant @samp{-L}
|
|
option.
|
|
|
|
@item --max-args=@var{max-args}
|
|
@itemx -n @var{max-args}
|
|
Use at most @var{max-args} arguments per command line. Fewer than
|
|
@var{max-args} arguments will be used if the size (see the @samp{-s}
|
|
option) is exceeded, unless the @samp{-x} option is given, in which
|
|
case @code{xargs} will exit.
|
|
|
|
@item --interactive
|
|
@itemx -p
|
|
Prompt the user about whether to run each command line and read a line
|
|
from the terminal. Only run the command line if the response starts
|
|
with @samp{y} or @samp{Y}. Implies @samp{-t}.
|
|
|
|
@item --no-run-if-empty
|
|
@itemx -r
|
|
If the standard input is completely empty, do not run the
|
|
command. By default, the command is run once even if there is no
|
|
input.
|
|
|
|
@item --max-chars=@var{max-chars}
|
|
@itemx -s @var{max-chars}
|
|
Use at most @var{max-chars} characters per command line, including the
|
|
command, initial arguments and any terminating nulls at the ends of
|
|
the argument strings.
|
|
|
|
@item --show-limits
|
|
Display the limits on the command-line length which are imposed by the
|
|
operating system, @code{xargs}' choice of buffer size and the
|
|
@samp{-s} option. Pipe the input from @file{/dev/null} (and perhaps
|
|
specify @samp{--no-run-if-empty}) if you don't want @code{xargs} to do
|
|
anything.
|
|
|
|
@item --verbose
|
|
@itemx -t
|
|
Print the command line on the standard error output before executing
|
|
it.
|
|
|
|
@item --version
|
|
Print the version number of @code{xargs} and exit.
|
|
|
|
@item --exit
|
|
@itemx -x
|
|
Exit if the size (see the @samp{-s} option) is exceeded.
|
|
|
|
|
|
@item --max-procs=@var{max-procs}
|
|
@itemx -P @var{max-procs}
|
|
Run simultaneously up to @var{max-procs} processes at once; the default is 1. If
|
|
@var{max-procs} is 0, @code{xargs} will run as many processes as
|
|
possible simultaneously. @xref{Controlling Parallelism}, for
|
|
information on dynamically controlling parallelism.
|
|
|
|
@item --process-slot-var=@var{environment-variable-name}
|
|
Set the environment variable @var{environment-variable-name} to a
|
|
unique value in each running child process. Each value is a decimal
|
|
integer. Values are reused once child processes exit. This can be
|
|
used in a rudimentary load distribution scheme, for example.
|
|
@end table
|
|
|
|
@node Invoking the shell from xargs
|
|
@subsection Invoking the shell from xargs
|
|
|
|
Normally, @code{xargs} will exec the command you specified directly,
|
|
without invoking a shell. This is normally the behaviour one would
|
|
want. It's somewhat more efficient and avoids problems with shell
|
|
metacharacters, for example. However, sometimes it is necessary to
|
|
manipulate the environment of a command before it is run, in a way
|
|
that @code{xargs} does not directly support.
|
|
|
|
Invoking a shell from @code{xargs} is a good way of performing such
|
|
manipulations. However, some care must be taken to prevent problems,
|
|
for example unwanted interpretation of shell metacharacters.
|
|
|
|
This command moves a set of files into an archive directory:
|
|
|
|
@example
|
|
find /foo -maxdepth 1 -atime +366 -exec mv @{@} /archive \;
|
|
@end example
|
|
|
|
However, this will only move one file at a time. We cannot in this
|
|
case use @code{-exec ... +} because the matched file names are added
|
|
at the end of the command line, while the destination directory would
|
|
need to be specified last. We also can't use @code{xargs} in the
|
|
obvious way for the same reason. One way of working around this
|
|
problem is to make use of the special properties of GNU @code{mv}; it
|
|
has a @code{-t} option that allows the target directory to be
|
|
specified before the list of files to be moved. However, while this
|
|
technique works for GNU @code{mv}, it doesn't solve the more general
|
|
problem.
|
|
|
|
Here is a more general technique for solving this problem:
|
|
|
|
@example
|
|
find /foo -maxdepth 1 -atime +366 -print0 |
|
|
xargs -r0 sh -c 'mv "$@@" /archive' move
|
|
@end example
|
|
|
|
Here, a shell is being invoked. There are two shell instances to think
|
|
about. The first is the shell which launches the @code{xargs} command
|
|
(this might be the shell into which you are typing, for example). The
|
|
second is the shell launched by @code{xargs} (in fact it will probably
|
|
launch several, one after the other, depending on how many files need to
|
|
be archived). We'll refer to this second shell as a subshell.
|
|
|
|
Our example uses the @code{-c} option of @code{sh}. Its argument is a
|
|
shell command to be executed by the subshell. Along with the rest of
|
|
that command, the $@@ is enclosed by single quotes to make sure it is
|
|
passed to the subshell without being expanded by the parent shell. It
|
|
is also enclosed with double quotes so that the subshell will expand
|
|
@code{$@@} correctly even if one of the file names contains a space or
|
|
newline.
|
|
|
|
The subshell will use any non-option arguments as positional
|
|
parameters (that is, in the expansion of @code{$@@}). Because
|
|
@code{xargs} launches the @code{sh -c} subshell with a list of files,
|
|
those files will end up as the expansion of @code{$@@}.
|
|
|
|
You may also notice the @samp{move} at the end of the command line.
|
|
This is used as the value of @code{$0} by the subshell. We include it
|
|
because otherwise the name of the first file to be moved would be used
|
|
instead. If that happened it would not be included in the subshell's
|
|
expansion of @code{$@@}, and so it wouldn't actually get moved.
|
|
|
|
|
|
Another reason to use the @code{sh -c} construct could be to
|
|
perform redirection:
|
|
|
|
@example
|
|
find /usr/include -name '*.h' | xargs grep -wl mode_t |
|
|
xargs -r sh -c 'exec emacs "$@@" < /dev/tty' Emacs
|
|
@end example
|
|
|
|
Notice that we use the shell builtin @code{exec} here. That's simply
|
|
because the subshell needs to do nothing once Emacs has been invoked.
|
|
Therefore instead of keeping a @code{sh} process around for no reason,
|
|
we just arrange for the subshell to exec Emacs, saving an extra
|
|
process creation.
|
|
|
|
Sometimes, though, it can be helpful to keep the shell process around:
|
|
|
|
@example
|
|
find /foo -maxdepth 1 -atime +366 -print0 |
|
|
xargs -r0 sh -c 'mv "$@@" /archive || exit 255' move
|
|
@end example
|
|
|
|
Here, the shell will exit with status 255 if any @code{mv} failed.
|
|
This causes @code{xargs} to stop immediately.
|
|
|
|
|
|
@node Regular Expressions
|
|
@section Regular Expressions
|
|
|
|
The @samp{-regex} and @samp{-iregex} tests of @code{find} allow
|
|
matching by regular expression, as does the @samp{--regex} option of
|
|
@code{locate}.
|
|
|
|
Your locale configuration affects how regular expressions are
|
|
interpreted. @xref{Environment Variables}, for a description of how
|
|
your locale setup affects the interpretation of regular expressions.
|
|
|
|
There are also several different types of regular expression, and
|
|
these are interpreted differently. Normally, the type of regular
|
|
expression used by @code{find} and @code{locate} is the same as is
|
|
used in GNU Emacs. Both programs provide an option which allows you
|
|
to select an alternative regular expression syntax; for @code{find}
|
|
this is the @samp{-regextype} option, and for @code{locate} this is
|
|
the @samp{--regextype} option.
|
|
|
|
These options take a single argument, which indicates the specific
|
|
regular expression syntax and behaviour that should be used. This
|
|
should be one of the following:
|
|
|
|
@include regexprops.texi
|
|
|
|
@node Environment Variables
|
|
@section Environment Variables
|
|
@c TODO: check the variable index still contains references to these
|
|
@table @code
|
|
@item LANG
|
|
Provides a default value for the internationalisation variables that
|
|
are unset or null.
|
|
|
|
@item LC_ALL
|
|
If set to a non-empty string value, override the values of all the
|
|
other internationalisation variables.
|
|
|
|
@item LC_COLLATE
|
|
The POSIX standard specifies that this variable affects the pattern
|
|
matching to be used for the `\-name' option. GNU find uses the
|
|
GNU version of the @code{fnmatch} library function.
|
|
|
|
This variable also affects the interpretation of the response to
|
|
@code{-ok}; while the @code{LC_MESSAGES} variable selects the actual
|
|
pattern used to interpret the response to @code{-ok}, the interpretation
|
|
of any bracket expressions in the pattern will be affected by the
|
|
@code{LC_COLLATE} variable.
|
|
|
|
@item LC_CTYPE
|
|
This variable affects the treatment of character classes used in
|
|
regular expression and with
|
|
the @samp{-name} test, if the @code{fnmatch} function supports this.
|
|
|
|
This variable also affects the interpretation of any character classes
|
|
in the regular expressions used to interpret the response to the
|
|
prompt issued by @code{-ok}. The @code{LC_CTYPE} environment variable will
|
|
also affect which characters are considered to be unprintable when
|
|
filenames are printed (@pxref{Unusual Characters in File Names}).
|
|
|
|
@item LC_MESSAGES
|
|
Determines the locale to be used for internationalised messages,
|
|
including the interpretation of the response to the prompt made by the
|
|
@code{-ok} action.
|
|
|
|
@item NLSPATH
|
|
Determines the location of the internationalisation message catalogues.
|
|
|
|
@item PATH
|
|
Affects the directories which are searched to find the executables
|
|
invoked by @samp{-exec}, @samp{-execdir} @samp{-ok} and @samp{-okdir}.
|
|
If the @var{PATH} environment variable includes the current directory
|
|
(by explicitly including @samp{.} or by having an empty element), and
|
|
the find command line includes @samp{-execdir} or @samp{-okdir},
|
|
@code{find} will refuse to run. @xref{Security Considerations}, for a
|
|
more detailed discussion of security matters.
|
|
|
|
@item POSIXLY_CORRECT
|
|
Determines the block size used by @samp{-ls} and @samp{-fls}. If
|
|
@code{POSIXLY_CORRECT} is set, blocks are units of 512 bytes. Otherwise
|
|
they are units of 1024 bytes.
|
|
|
|
Setting this variable also turns off warning messages (that is, implies
|
|
@samp{-nowarn}) by default, because POSIX requires that apart from
|
|
the output for @samp{-ok}, all messages printed on stderr are
|
|
diagnostics and must result in a non-zero exit status.
|
|
|
|
When @code{POSIXLY_CORRECT} is set, the response to the prompt made by the
|
|
@code{-ok} action is interpreted according to the system's message
|
|
catalogue, as opposed to according to @code{find}'s own message
|
|
translations.
|
|
|
|
@item TZ
|
|
Affects the time zone used for some of the time-related format
|
|
directives of @samp{-printf} and @samp{-fprintf}.
|
|
@end table
|
|
|
|
|
|
|
|
@node Common Tasks
|
|
@chapter Common Tasks
|
|
|
|
The sections that follow contain some extended examples that both give
|
|
a good idea of the power of these programs, and show you how to solve
|
|
common real-world problems.
|
|
|
|
@menu
|
|
* Viewing And Editing::
|
|
* Archiving::
|
|
* Cleaning Up::
|
|
* Strange File Names::
|
|
* Fixing Permissions::
|
|
* Classifying Files::
|
|
@end menu
|
|
|
|
@node Viewing And Editing
|
|
@section Viewing And Editing
|
|
|
|
To view a list of files that meet certain criteria, simply run your
|
|
file viewing program with the file names as arguments. Shells
|
|
substitute a command enclosed in backquotes with its output, so the
|
|
whole command looks like this:
|
|
|
|
@example
|
|
less `find /usr/include -name '*.h' | xargs grep -l mode_t`
|
|
@end example
|
|
|
|
@noindent
|
|
You can edit those files by giving an editor name instead of a file
|
|
viewing program:
|
|
|
|
@example
|
|
emacs `find /usr/include -name '*.h' | xargs grep -l mode_t`
|
|
@end example
|
|
|
|
Because there is a limit to the length of any individual command line,
|
|
there is a limit to the number of files that can be handled in this way.
|
|
We can get around this difficulty by using @code{xargs} like this:
|
|
|
|
@example
|
|
find /usr/include -name '*.h' | xargs grep -l mode_t > todo
|
|
xargs --arg-file=todo emacs
|
|
@end example
|
|
|
|
Here, @code{xargs} will run @code{emacs} as many times as necessary to
|
|
visit all of the files listed in the file @file{todo}. Generating a
|
|
temporary file is not always convenient, though. This command does
|
|
much the same thing without needing one:
|
|
|
|
@example
|
|
find /usr/include -name '*.h' | xargs grep -l mode_t |
|
|
xargs sh -c 'emacs "$@@" < /dev/tty' Emacs
|
|
@end example
|
|
|
|
The example above illustrates a useful trick; Using @code{sh -c} you
|
|
can invoke a shell command from @code{xargs}. The @code{$@@} in the
|
|
command line is expanded by the shell to a list of arguments as
|
|
provided by @code{xargs}. The single quotes in the command line
|
|
protect the @code{$@@} against expansion by your interactive shell
|
|
(which will normally have no arguments and thus expand @code{$@@} to
|
|
nothing). The capitalised @samp{Emacs} on the command line is used as
|
|
@code{$0} by the shell that @code{xargs} launches.
|
|
|
|
@node Archiving
|
|
@section Archiving
|
|
|
|
You can pass a list of files produced by @code{find} to a file
|
|
archiving program. GNU @code{tar} and @code{cpio} can both read lists
|
|
of file names from the standard input -- either delimited by nulls (the
|
|
safe way) or by blanks (the lazy, risky default way). To use
|
|
null-delimited names, give them the @samp{--null} option. You can
|
|
store a file archive in a file, write it on a tape, or send it over a
|
|
network to extract on another machine.
|
|
|
|
One common use of @code{find} to archive files is to send a list of
|
|
the files in a directory tree to @code{cpio}. Use @samp{-depth} so if
|
|
a directory does not have write permission for its owner, its contents
|
|
can still be restored from the archive since the directory's
|
|
permissions are restored after its contents. Here is an example of
|
|
doing this using @code{cpio}; you could use a more complex @code{find}
|
|
expression to archive only certain files.
|
|
|
|
@example
|
|
find . -depth -print0 |
|
|
cpio --create --null --format=crc --file=/dev/nrst0
|
|
@end example
|
|
|
|
You could restore that archive using this command:
|
|
|
|
@example
|
|
cpio --extract --null --make-dir --unconditional \
|
|
--preserve --file=/dev/nrst0
|
|
@end example
|
|
|
|
Here are the commands to do the same things using @code{tar}:
|
|
|
|
@example
|
|
find . -depth -print0 |
|
|
tar --create --null --files-from=- --file=/dev/nrst0
|
|
|
|
tar --extract --null --preserve-perm --same-owner \
|
|
--file=/dev/nrst0
|
|
@end example
|
|
|
|
@c Idea from Rick Sladkey.
|
|
Here is an example of copying a directory from one machine to another:
|
|
|
|
@example
|
|
find . -depth -print0 | cpio -0o -Hnewc |
|
|
rsh @var{other-machine} "cd `pwd` && cpio -i0dum"
|
|
@end example
|
|
|
|
@node Cleaning Up
|
|
@section Cleaning Up
|
|
|
|
@c Idea from Jim Meyering.
|
|
This section gives examples of removing unwanted files in various
|
|
situations. Here is a command to remove the CVS backup files created
|
|
when an update requires a merge:
|
|
|
|
@example
|
|
find . -name '.#*' -print0 | xargs -0r rm -f
|
|
@end example
|
|
|
|
If your @code{find} command removes directories, you may find that
|
|
you get a spurious error message when @code{find} tries to recurse
|
|
into a directory that has now been removed. Using the @samp{-depth}
|
|
option will normally resolve this problem.
|
|
|
|
@c What does the following sentence mean? Why is -delete safer? --kasal
|
|
@c The command above works, but the following is safer:
|
|
|
|
It is also possible to use the @samp{-delete} action:
|
|
|
|
@example
|
|
find . -depth -name '.#*' -delete
|
|
@end example
|
|
|
|
@c Idea from Franc,ois Pinard.
|
|
You can run this command to clean out your clutter in @file{/tmp}.
|
|
You might place it in the file your shell runs when you log out
|
|
(@file{.bash_logout}, @file{.logout}, or @file{.zlogout}, depending on
|
|
which shell you use).
|
|
|
|
@example
|
|
find /tmp -depth -user "$LOGNAME" -type f -delete
|
|
@end example
|
|
|
|
@c Idea from Noah Friedman.
|
|
To remove old Emacs backup and auto-save files, you can use a command
|
|
like the following. It is especially important in this case to use
|
|
null-terminated file names because Emacs packages like the VM mailer
|
|
often create temporary file names with spaces in them, like
|
|
@file{#reply to David J. MacKenzie<1>#}.
|
|
|
|
@example
|
|
find ~ \( -name '*~' -o -name '#*#' \) -print0 |
|
|
xargs --no-run-if-empty --null rm -vf
|
|
@end example
|
|
|
|
Removing old files from @file{/tmp} is commonly done from @code{cron}:
|
|
|
|
@c Idea from Kaveh Ghazi.
|
|
@example
|
|
find /tmp /var/tmp -depth -not -type d -mtime +3 -delete
|
|
find /tmp /var/tmp -depth -mindepth 1 -type d -empty -delete
|
|
@end example
|
|
|
|
The second @code{find} command above cleans out empty directories
|
|
depth-first (@samp{-delete} implies @samp{-depth} anyway), hoping that
|
|
the parents become empty and can be removed too. It uses
|
|
@samp{-mindepth} to avoid removing @file{/tmp} itself if it becomes
|
|
totally empty.
|
|
|
|
|
|
Lastly, an example of a program that almost certainly does not do what
|
|
the user intended:
|
|
|
|
@c inspired by Savannah bug #20865 (Bruno De Fraine)
|
|
@example
|
|
find dirname -delete -name quux
|
|
@end example
|
|
|
|
If the user hoped to delete only files named @file{quux} they will get
|
|
an unpleasant surprise; this command will attempt to delete everything
|
|
at or below the starting point @file{dirname}. This is because
|
|
@code{find} evaluates the items on the command line as an expression.
|
|
The @code{find} program will normally execute an action if the
|
|
preceding action succeeds. Here, there is no action or test before
|
|
the @samp{-delete} so it will always be executed. The @samp{-name
|
|
quux} test will be performed for files we successfully deleted, but
|
|
that test has no effect since @samp{-delete} also disables the default
|
|
@samp{-print} operation. So the above example will probably delete a
|
|
lot of files the user didn't want to delete.
|
|
|
|
This command is also likely to do something you did not intend:
|
|
@example
|
|
find dirname -path dirname/foo -prune -o -delete
|
|
@end example
|
|
|
|
Because @samp{-delete} turns on @samp{-depth}, the @samp{-prune}
|
|
action has no effect and files in @file{dirname/foo} will be deleted
|
|
too.
|
|
|
|
|
|
@node Strange File Names
|
|
@section Strange File Names
|
|
|
|
@c Idea from:
|
|
@c From: tmatimar@isgtec.com (Ted Timar)
|
|
@c Newsgroups: comp.unix.questions,comp.unix.shell,comp.answers,news.answers
|
|
@c Subject: Unix - Frequently Asked Questions (2/7) [Frequent posting]
|
|
@c Subject: How do I remove a file with funny characters in the filename ?
|
|
@c Date: Thu Mar 18 17:16:55 EST 1993
|
|
@code{find} can help you remove or rename a file with strange
|
|
characters in its name. People are sometimes stymied by files whose
|
|
names contain characters such as spaces, tabs, control characters, or
|
|
characters with the high bit set. The simplest way to remove such
|
|
files is:
|
|
|
|
@example
|
|
rm -i @var{some*pattern*that*matches*the*problem*file}
|
|
@end example
|
|
|
|
@code{rm} asks you whether to remove each file matching the given
|
|
pattern. If you are using an old shell, this approach might not work
|
|
if the file name contains a character with the high bit set; the shell
|
|
may strip it off. A more reliable way is:
|
|
|
|
@example
|
|
find . -maxdepth 1 @var{tests} -okdir rm '@{@}' \;
|
|
@end example
|
|
|
|
@noindent
|
|
where @var{tests} uniquely identify the file. The @samp{-maxdepth 1}
|
|
option prevents @code{find} from wasting time searching for the file
|
|
in any subdirectories; if there are no subdirectories, you may omit
|
|
it. A good way to uniquely identify the problem file is to figure out
|
|
its inode number; use
|
|
|
|
@example
|
|
ls -i
|
|
@end example
|
|
|
|
Suppose you have a file whose name contains control characters, and
|
|
you have found that its inode number is 12345. This command prompts
|
|
you for whether to remove it:
|
|
|
|
@example
|
|
find . -maxdepth 1 -inum 12345 -okdir rm -f '@{@}' \;
|
|
@end example
|
|
|
|
If you don't want to be asked, perhaps because the file name may
|
|
contain a strange character sequence that will mess up your screen
|
|
when printed, then use @samp{-execdir} instead of @samp{-okdir}.
|
|
|
|
If you want to rename the file instead, you can use @code{mv} instead
|
|
of @code{rm}:
|
|
|
|
@example
|
|
find . -maxdepth 1 -inum 12345 -okdir mv '@{@}' @var{new-file-name} \;
|
|
@end example
|
|
|
|
@node Fixing Permissions
|
|
@section Fixing Permissions
|
|
|
|
Suppose you want to make sure that everyone can write to the
|
|
directories in a certain directory tree. Here is a way to find
|
|
directories lacking either user or group write permission (or both),
|
|
and fix their permissions:
|
|
|
|
@example
|
|
find . -type d -not -perm -ug=w | xargs chmod ug+w
|
|
@end example
|
|
|
|
@noindent
|
|
You could also reverse the operations, if you want to make sure that
|
|
directories do @emph{not} have world write permission.
|
|
|
|
@node Classifying Files
|
|
@section Classifying Files
|
|
|
|
@c Idea from:
|
|
@c From: martin@mwtech.UUCP (Martin Weitzel)
|
|
@c Newsgroups: comp.unix.wizards,comp.unix.questions
|
|
@c Subject: Advanced usage of 'find' (Re: Unix security automating script)
|
|
@c Date: 22 Mar 90 15:05:19 GMT
|
|
If you want to classify a set of files into several groups based on
|
|
different criteria, you can use the comma operator to perform multiple
|
|
independent tests on the files. Here is an example:
|
|
|
|
@example
|
|
find / -type d \( -perm -o=w -fprint allwrite , \
|
|
-perm -o=x -fprint allexec \)
|
|
|
|
echo "Directories that can be written to by everyone:"
|
|
cat allwrite
|
|
echo ""
|
|
echo "Directories with search permissions for everyone:"
|
|
cat allexec
|
|
@end example
|
|
|
|
@code{find} has only to make one scan through the directory tree
|
|
(which is one of the most time consuming parts of its work).
|
|
|
|
@node Worked Examples
|
|
@chapter Worked Examples
|
|
|
|
The tools in the findutils package, and in particular @code{find},
|
|
have a large number of options. This means that quite often,
|
|
there is more than one way to do things. Some of the options
|
|
and facilities only exist for compatibility with other tools, and
|
|
findutils provides improved ways of doing things.
|
|
|
|
This chapter describes a number of useful tasks that are commonly
|
|
performed, and compares the different ways of achieving them.
|
|
|
|
@menu
|
|
* Deleting Files::
|
|
* Copying A Subset of Files::
|
|
* Updating A Timestamp File::
|
|
* Finding the Shallowest Instance::
|
|
@end menu
|
|
|
|
@node Deleting Files
|
|
@section Deleting Files
|
|
|
|
One of the most common tasks that @code{find} is used for is locating
|
|
files that can be deleted. This might include:
|
|
|
|
@itemize
|
|
@item
|
|
Files last modified more than 3 years ago which haven't been accessed
|
|
for at least 2 years
|
|
@item
|
|
Files belonging to a certain user
|
|
@item
|
|
Temporary files which are no longer required
|
|
@end itemize
|
|
|
|
This example concentrates on the actual deletion task rather than on
|
|
sophisticated ways of locating the files that need to be deleted.
|
|
We'll assume that the files we want to delete are old files underneath
|
|
@file{/var/tmp/stuff}.
|
|
|
|
@subsection The Traditional Way
|
|
|
|
The traditional way to delete files in @file{/var/tmp/stuff} that have
|
|
not been modified in over 90 days would have been:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \;
|
|
@end smallexample
|
|
|
|
The above command uses @samp{-exec} to run the @code{/bin/rm} command
|
|
to remove each file. This approach works and in fact would have
|
|
worked in Version 7 Unix in 1979. However, there are a number of
|
|
problems with this approach.
|
|
|
|
|
|
The most obvious problem with the approach above is that it causes
|
|
@code{find} to fork every time it finds a file that needs to delete,
|
|
and the child process then has to use the @code{exec} system call to
|
|
launch @code{/bin/rm}. All this is quite inefficient. If we are
|
|
going to use @code{/bin/rm} to do this job, it is better to make it
|
|
delete more than one file at a time.
|
|
|
|
The most obvious way of doing this is to use the shell's command
|
|
expansion feature:
|
|
|
|
@smallexample
|
|
/bin/rm `find /var/tmp/stuff -mtime +90 -print`
|
|
@end smallexample
|
|
or you could use the more modern form
|
|
@smallexample
|
|
/bin/rm $(find /var/tmp/stuff -mtime +90 -print)
|
|
@end smallexample
|
|
|
|
The commands above are much more efficient than the first attempt.
|
|
However, there is a problem with them. The shell has a maximum
|
|
command length which is imposed by the operating system (the actual
|
|
limit varies between systems). This means that while the command
|
|
expansion technique will usually work, it will suddenly fail when
|
|
there are lots of files to delete. Since the task is to delete
|
|
unwanted files, this is precisely the time we don't want things to go
|
|
wrong.
|
|
|
|
@subsection Making Use of @code{xargs}
|
|
|
|
So, is there a way to be more efficient in the use of @code{fork()}
|
|
and @code{exec()} without running up against this limit?
|
|
Yes, we can be almost optimally efficient by making use
|
|
of the @code{xargs} command. The @code{xargs} command reads arguments
|
|
from its standard input and builds them into command lines. We can
|
|
use it like this:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -print | xargs /bin/rm
|
|
@end smallexample
|
|
|
|
For example if the files found by @code{find} are
|
|
@file{/var/tmp/stuff/A},
|
|
@file{/var/tmp/stuff/B} and
|
|
@file{/var/tmp/stuff/C} then @code{xargs} might issue the commands
|
|
|
|
@smallexample
|
|
/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B
|
|
/bin/rm /var/tmp/stuff/C
|
|
@end smallexample
|
|
|
|
The above assumes that @code{xargs} has a very small maximum command
|
|
line length. The real limit is much larger but the idea is that
|
|
@code{xargs} will run @code{/bin/rm} as many times as necessary to get
|
|
the job done, given the limits on command line length.
|
|
|
|
This usage of @code{xargs} is pretty efficient, and the @code{xargs}
|
|
command is widely implemented (all modern versions of Unix offer it).
|
|
So far then, the news is all good. However, there is bad news too.
|
|
|
|
@subsection Unusual characters in filenames
|
|
|
|
Unix-like systems allow any characters to appear in file names with
|
|
the exception of the ASCII NUL character and the slash.
|
|
Slashes can occur in path names (as the directory separator) but
|
|
not in the names of actual directory entries. This means that the
|
|
list of files that @code{xargs} reads could in fact contain white space
|
|
characters -- spaces, tabs and newline characters. Since by default,
|
|
@code{xargs} assumes that the list of files it is reading uses white
|
|
space as an argument separator, it cannot correctly handle the case
|
|
where a filename actually includes white space. This makes the
|
|
default behaviour of @code{xargs} almost useless for handling
|
|
arbitrary data.
|
|
|
|
To solve this problem, GNU findutils introduced the @samp{-print0}
|
|
action for @code{find}. This uses the ASCII NUL character to separate
|
|
the entries in the file list that it produces. This is the ideal
|
|
choice of separator since it is the only character that cannot appear
|
|
within a path name. The @samp{-0} option to @code{xargs} makes it
|
|
assume that arguments are separated with ASCII NUL instead of white
|
|
space. It also turns off another misfeature in the default behaviour
|
|
of @code{xargs}, which is that it pays attention to quote characters
|
|
in its input. Some versions of @code{xargs} also terminate when they
|
|
see a lone @samp{_} in the input, but GNU @code{find} no longer does
|
|
that (since it has become an optional behaviour in the Unix standard).
|
|
|
|
So, putting @code{find -print0} together with @code{xargs -0} we get
|
|
this command:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -print0 | xargs -0 /bin/rm
|
|
@end smallexample
|
|
|
|
The result is an efficient way of proceeding that
|
|
correctly handles all the possible characters that could appear in the
|
|
list of files to delete. This is good news. However, there is, as
|
|
I'm sure you're expecting, also more bad news. The problem is that
|
|
this is not a portable construct; although other versions of Unix
|
|
(notably BSD-derived ones) support @samp{-print0}, it's not
|
|
universal. So, is there a more universal mechanism?
|
|
|
|
@subsection Going back to @code{-exec}
|
|
|
|
There is indeed a more universal mechanism, which is a slight
|
|
modification to the @samp{-exec} action. The normal @samp{-exec}
|
|
action assumes that the command to run is terminated with a semicolon
|
|
(the semicolon normally has to be quoted in order to protect it from
|
|
interpretation as the shell command separator). The SVR4 edition of
|
|
Unix introduced a slight variation, which involves terminating the
|
|
command with @samp{+} instead:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \+
|
|
@end smallexample
|
|
|
|
The above use of @samp{-exec} causes @code{find} to build up a long
|
|
command line and then issue it. This can be less efficient than some
|
|
uses of @code{xargs}; for example @code{xargs} allows new command
|
|
lines to be built up while the previous command is still executing, and
|
|
allows you to specify a number of commands to run in parallel.
|
|
However, the @code{find @dots{} -exec @dots{} +} construct has the advantage
|
|
of wide portability. GNU findutils did not support @samp{-exec @dots{} +}
|
|
until version 4.2.12; one of the reasons for this is that it already
|
|
had the @samp{-print0} action in any case.
|
|
|
|
|
|
@subsection A more secure version of @code{-exec}
|
|
|
|
The command above seems to be efficient and portable. However,
|
|
within it lurks a security problem. The problem is shared with
|
|
all the commands we've tried in this worked example so far, too. The
|
|
security problem is a race condition; that is, if it is possible for
|
|
somebody to manipulate the filesystem that you are searching while you
|
|
are searching it, it is possible for them to persuade your @code{find}
|
|
command to cause the deletion of a file that you can delete but they
|
|
normally cannot.
|
|
|
|
The problem occurs because the @samp{-exec} action is defined by the
|
|
POSIX standard to invoke its command with the same working directory
|
|
as @code{find} had when it was started. This means that the arguments
|
|
which replace the @{@} include a relative path from @code{find}'s
|
|
starting point down the file that needs to be deleted. For example,
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \+
|
|
@end smallexample
|
|
|
|
might actually issue the command:
|
|
|
|
@smallexample
|
|
/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B /var/tmp/stuff/passwd
|
|
@end smallexample
|
|
|
|
Notice the file @file{/var/tmp/stuff/passwd}. Likewise, the command:
|
|
|
|
@smallexample
|
|
cd /var/tmp && find stuff -mtime +90 -exec /bin/rm @{@} \+
|
|
@end smallexample
|
|
|
|
might actually issue the command:
|
|
|
|
@smallexample
|
|
/bin/rm stuff/A stuff/B stuff/passwd
|
|
@end smallexample
|
|
|
|
If an attacker can rename @file{stuff} to something else (making use
|
|
of their write permissions in @file{/var/tmp}) they can replace it
|
|
with a symbolic link to @file{/etc}. That means that the
|
|
@code{/bin/rm} command will be invoked on @file{/etc/passwd}. If you
|
|
are running your @code{find} command as root, the attacker has just managed
|
|
to delete a vital file. All they needed to do to achieve this was
|
|
replace a subdirectory with a symbolic link at the vital moment.
|
|
|
|
There is however, a simple solution to the problem. This is an action
|
|
which works a lot like @code{-exec} but doesn't need to traverse a
|
|
chain of directories to reach the file that it needs to work on. This
|
|
is the @samp{-execdir} action, which was introduced by the BSD family
|
|
of operating systems. The command,
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -execdir /bin/rm @{@} \+
|
|
@end smallexample
|
|
|
|
might delete a set of files by performing these actions:
|
|
|
|
@enumerate
|
|
@item
|
|
Change directory to /var/tmp/stuff/foo
|
|
@item
|
|
Invoke @code{/bin/rm ./file1 ./file2 ./file3}
|
|
@item
|
|
Change directory to /var/tmp/stuff/bar
|
|
@item
|
|
Invoke @code{/bin/rm ./file99 ./file100 ./file101}
|
|
@end enumerate
|
|
|
|
This is a much more secure method. We are no longer exposed to a race
|
|
condition. For many typical uses of @code{find}, this is the best
|
|
strategy. It's reasonably efficient, but the length of the command
|
|
line is limited not just by the operating system limits, but also by
|
|
how many files we actually need to delete from each directory.
|
|
|
|
Is it possible to do any better? In the case of general file
|
|
processing, no. However, in the specific case of deleting files it is
|
|
indeed possible to do better.
|
|
|
|
@subsection Using the @code{-delete} action
|
|
|
|
The most efficient and secure method of solving this problem is to use
|
|
the @samp{-delete} action:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff -mtime +90 -delete
|
|
@end smallexample
|
|
|
|
This alternative is more efficient than any of the @samp{-exec} or
|
|
@samp{-execdir} actions, since it entirely avoids the overhead of
|
|
forking a new process and using @code{exec} to run @code{/bin/rm}. It
|
|
is also normally more efficient than @code{xargs} for the same
|
|
reason. The file deletion is performed from the directory containing
|
|
the entry to be deleted, so the @samp{-delete} action has the same
|
|
security advantages as the @samp{-execdir} action has.
|
|
|
|
The @samp{-delete} action was introduced by the BSD family of
|
|
operating systems.
|
|
|
|
@subsection Improving things still further
|
|
|
|
Is it possible to improve things still further? Not without either
|
|
modifying the system library to the operating system or having more specific
|
|
knowledge of the layout of the filesystem and disk I/O subsystem, or
|
|
both.
|
|
|
|
The @code{find} command traverses the filesystem, reading
|
|
directories. It then issues a separate system call for each file to
|
|
be deleted. If we could modify the operating system, there are
|
|
potential gains that could be made:
|
|
|
|
@itemize
|
|
@item
|
|
We could have a system call to which we pass more than one filename
|
|
for deletion
|
|
@item
|
|
Alternatively, we could pass in a list of inode numbers (on GNU/Linux
|
|
systems, @code{readdir()} also returns the inode number of each
|
|
directory entry) to be deleted.
|
|
@end itemize
|
|
|
|
The above possibilities sound interesting, but from the kernel's point
|
|
of view it is difficult to enforce standard Unix access controls for
|
|
such processing by inode number. Such a facility would probably
|
|
need to be restricted to the superuser.
|
|
|
|
Another way of improving performance would be to increase the
|
|
parallelism of the process. For example if the directory hierarchy we
|
|
are searching is actually spread across a number of disks, we might
|
|
somehow be able to arrange for @code{find} to process each disk in
|
|
parallel. In practice GNU @code{find} doesn't have such an intimate
|
|
understanding of the system's filesystem layout and disk I/O
|
|
subsystem.
|
|
|
|
However, since the system administrator can have such an understanding
|
|
they can take advantage of it like so:
|
|
|
|
@smallexample
|
|
find /var/tmp/stuff1 -mtime +90 -delete &
|
|
find /var/tmp/stuff2 -mtime +90 -delete &
|
|
find /var/tmp/stuff3 -mtime +90 -delete &
|
|
find /var/tmp/stuff4 -mtime +90 -delete &
|
|
wait
|
|
@end smallexample
|
|
|
|
In the example above, four separate instances of @code{find} are used
|
|
to search four subdirectories in parallel. The @code{wait} command
|
|
simply waits for all of these to complete. Whether this approach is
|
|
more or less efficient than a single instance of @code{find} depends
|
|
on a number of things:
|
|
|
|
@itemize
|
|
@item
|
|
Are the directories being searched in parallel actually on separate
|
|
disks? If not, this parallel search might just result in a lot of
|
|
disk head movement and so the speed might even be slower.
|
|
@item
|
|
Other activity - are other programs also doing things on those disks?
|
|
@end itemize
|
|
|
|
|
|
@subsection Conclusion
|
|
|
|
The fastest and most secure way to delete files with the help of
|
|
@code{find} is to use @samp{-delete}. Using @code{xargs -0 -P N} can
|
|
also make effective use of the disk, but it is not as secure.
|
|
|
|
In the case where we're doing things other than deleting files, the
|
|
most secure alternative is @samp{-execdir @dots{} +}, but this is not as
|
|
portable as the insecure action @samp{-exec @dots{} +}.
|
|
|
|
The @samp{-delete} action is not completely portable, but the only
|
|
other possibility which is as secure (@samp{-execdir}) is no more
|
|
portable. The most efficient portable alternative is @samp{-exec
|
|
@dots{}+}, but this is insecure and isn't supported by versions of GNU
|
|
findutils prior to 4.2.12.
|
|
|
|
@node Copying A Subset of Files
|
|
@section Copying A Subset of Files
|
|
|
|
Suppose you want to copy some files from @file{/source-dir} to
|
|
@file{/dest-dir}, but there are a small number of files in
|
|
@file{/source-dir} you don't want to copy.
|
|
|
|
One option of course is @code{cp /source-dir /dest-dir} followed by
|
|
deletion of the unwanted material under @file{/dest-dir}. But often
|
|
that can be inconvenient, because for example we would have copied a
|
|
large amount of extraneous material, or because @file{/dest-dir} is
|
|
too small. Naturally there are many other possible reasons why this
|
|
strategy may be unsuitable.
|
|
|
|
So we need to have some way of identifying which files we want to
|
|
copy, and we need to have a way of copying that file list. The second
|
|
part of this condition is met by @code{cpio -p}. Of course, we can
|
|
identify the files we wish to copy by using @code{find}. Here is a
|
|
command that solves our problem:
|
|
|
|
@example
|
|
cd /source-dir
|
|
find . -name '.snapshot' -prune -o \( \! -name '*~' -print0 \) |
|
|
cpio -pmd0 /dest-dir
|
|
@end example
|
|
|
|
The first part of the @code{find} command here identifies files or
|
|
directories named @file{.snapshot} and tells @code{find} not to
|
|
recurse into them (since they do not need to be copied). The
|
|
combination @code{-name '.snapshot' -prune} yields false for anything
|
|
that didn't get pruned, but it is exactly those files we want to
|
|
copy. Therefore we need to use an OR (@samp{-o}) condition to
|
|
introduce the rest of our expression. The remainder of the expression
|
|
simply arranges for the name of any file not ending in @samp{~} to be
|
|
printed.
|
|
|
|
Using @code{-print0} ensures that white space characters in file names
|
|
do not pose a problem. The @code{cpio} command does the actual work
|
|
of copying files. The program as a whole fails if the @code{cpio}
|
|
program returns nonzero. If the @code{find} command returns non-zero
|
|
on the other hand, the Unix shell will not diagnose a problem (since
|
|
@code{find} is not the last command in the pipeline).
|
|
|
|
|
|
@node Updating A Timestamp File
|
|
@section Updating A Timestamp File
|
|
|
|
Suppose we have a directory full of files which is maintained with a
|
|
set of automated tools; perhaps one set of tools updates them and
|
|
another set of tools uses the result. In this situation, it might be
|
|
useful for the second set of tools to know if the files have recently
|
|
been changed. It might be useful, for example, to have a 'timestamp'
|
|
file which gives the timestamp on the newest file in the collection.
|
|
|
|
We can use @code{find} to achieve this, but there are several
|
|
different ways to do it.
|
|
|
|
@subsection Updating the Timestamp The Wrong Way
|
|
|
|
The obvious but wrong answer is just to use @samp{-newer}:
|
|
|
|
@smallexample
|
|
find subdir -newer timestamp -exec touch -r @{@} timestamp \;
|
|
@end smallexample
|
|
|
|
This does the right sort of thing but has a bug. Suppose that two
|
|
files in the subdirectory have been updated, and that these are called
|
|
@file{file1} and @file{file2}. The command above will update
|
|
@file{timestamp} with the modification time of @file{file1} or that of
|
|
@file{file2}, but we don't know which one. Since the timestamps on
|
|
@file{file1} and @file{file2} will in general be different, this could
|
|
well be the wrong value.
|
|
|
|
One solution to this problem is to modify @code{find} to recheck the
|
|
modification time of @file{timestamp} every time a file is to be
|
|
compared against it, but that will reduce the performance of
|
|
@code{find}.
|
|
|
|
@subsection Using the test utility to compare timestamps
|
|
|
|
The @code{test} command can be used to compare timestamps:
|
|
|
|
@smallexample
|
|
find subdir -exec test @{@} -nt timestamp \; -exec touch -r @{@} timestamp \;
|
|
@end smallexample
|
|
|
|
This will ensure that any changes made to the modification time of
|
|
@file{timestamp} that take place during the execution of @code{find}
|
|
are taken into account. This resolves our earlier problem, but
|
|
unfortunately this runs much more slowly.
|
|
|
|
@subsection A combined approach
|
|
|
|
We can of course still use @samp{-newer} to cut down on the number of
|
|
calls to @code{test}:
|
|
|
|
@smallexample
|
|
find subdir -newer timestamp -and \
|
|
-exec test @{@} -nt timestamp \; -and \
|
|
-exec touch -r @{@} timestamp \;
|
|
@end smallexample
|
|
|
|
Here, the @samp{-newer} test excludes all the files which are
|
|
definitely older than the timestamp, but all the files which are newer
|
|
than the old value of the timestamp are compared against the current
|
|
updated timestamp.
|
|
|
|
This is indeed faster in general, but the speed difference will depend
|
|
on how many updated files there are.
|
|
|
|
@subsection Using @code{-printf} and @code{sort} to compare timestamps
|
|
|
|
It is possible to use the @samp{-printf} action to abandon the use of
|
|
@code{test} entirely:
|
|
|
|
@smallexample
|
|
newest=$(find subdir -newer timestamp -printf "%A@:%p\n" |
|
|
sort -n |
|
|
tail -1 |
|
|
cut -d: -f2- )
|
|
touch -r "$@{newest:-timestamp@}" timestamp
|
|
@end smallexample
|
|
|
|
The command above works by generating a list of the timestamps and
|
|
names of all the files which are newer than the timestamp. The
|
|
@code{sort}, @code{tail} and @code{cut} commands simply pull out the
|
|
name of the file with the largest timestamp value (that is, the latest
|
|
file). The @code{touch} command is then used to update the timestamp,
|
|
|
|
The @code{"$@{newest:-timestamp@}"} expression simply expands to the
|
|
value of @code{$newest} if that variable is set, but to
|
|
@file{timestamp} otherwise. This ensures that an argument is always
|
|
given to the @samp{-r} option of the @code{touch} command.
|
|
|
|
This approach seems quite efficient, but unfortunately it has a
|
|
problem. Many operating systems now keep file modification time
|
|
information at a granularity which is finer than one second.
|
|
Findutils version 4.3.3 and later will print a fractional part with
|
|
%A@@, but older versions will not.
|
|
|
|
|
|
@subsection Solving the problem with @code{make}
|
|
|
|
Another tool which often works with timestamps is @code{make}. We can
|
|
use @code{find} to generate a @file{Makefile} file on the fly and then
|
|
use @code{make} to update the timestamps:
|
|
|
|
@smallexample
|
|
makefile=$(mktemp)
|
|
find subdir \
|
|
\( \! -xtype l \) \
|
|
-newer timestamp \
|
|
-printf "timestamp:: %p\n\ttouch -r %p timestamp\n\n" > "$makefile"
|
|
make -f "$makefile"
|
|
rm -f "$makefile"
|
|
@end smallexample
|
|
|
|
Unfortunately although the solution above is quite elegant, it fails
|
|
to cope with white space within file names, and adjusting it to do so
|
|
would require a rather complex shell script.
|
|
|
|
|
|
@subsection Coping with odd filenames too
|
|
|
|
We can fix both of these problems (looping and problems with white
|
|
space), and do things more efficiently too. The following command
|
|
works with newlines and doesn't need to sort the list of filenames.
|
|
|
|
@smallexample
|
|
find subdir -newer timestamp -printf "%A@@:%p\0" |
|
|
perl -0 newest.pl |
|
|
xargs --no-run-if-empty --null -i \
|
|
find @{@} -maxdepth 0 -newer timestamp -exec touch -r @{@} timestamp \;
|
|
@end smallexample
|
|
|
|
The first @code{find} command generates a list of files which are
|
|
newer than the original timestamp file, and prints a list of them with
|
|
their timestamps. The @file{newest.pl} script simply filters out all
|
|
the filenames which have timestamps which are older than whatever the
|
|
newest file is:
|
|
|
|
@smallexample
|
|
@verbatim
|
|
#! /usr/bin/perl -0
|
|
my @newest = ();
|
|
my $latest_stamp = undef;
|
|
while (<>) {
|
|
my ($stamp, $name) = split(/:/);
|
|
if (!defined($latest_stamp) || ($tstamp > $latest_stamp)) {
|
|
$latest_stamp = $stamp;
|
|
@newest = ();
|
|
}
|
|
if ($tstamp >= $latest_stamp) {
|
|
push @newest, $name;
|
|
}
|
|
}
|
|
print join("\0", @newest);
|
|
@end verbatim
|
|
@end smallexample
|
|
|
|
This prints a list of zero or more files, all of which are newer than
|
|
the original timestamp file, and which have the same timestamp as each
|
|
other, to the nearest second. The second @code{find} command takes
|
|
each resulting file one at a time, and if that is newer than the
|
|
timestamp file, the timestamp is updated.
|
|
|
|
@node Finding the Shallowest Instance
|
|
@section Finding the Shallowest Instance
|
|
|
|
Suppose you maintain local copies of sources from various projects,
|
|
each with their own choice of directory organisation and source code
|
|
management (SCM) tool. You need to periodically synchronize each
|
|
project with its upstream tree. As the number local repositories
|
|
grows, so does the work involved in maintaining synchronization. SCM
|
|
utilities typically create some sort of administrative directory: .svn
|
|
for Subversion, CVS for CVS, and so on. These directories can be used
|
|
as a key to search for the bases of the project source trees. Suppose
|
|
we have the following directory structure:
|
|
|
|
@smallexample
|
|
repo/project1/CVS
|
|
repo/gnu/project2/.svn
|
|
repo/gnu/project3/.svn
|
|
repo/gnu/project3/src/.svn
|
|
repo/gnu/project3/doc/.svn
|
|
repo/project4/.git
|
|
@end smallexample
|
|
|
|
One would expect to update each of the @file{projectX} directories,
|
|
but not their subdirectories (src, doc, etc.). To locate the project
|
|
roots, we would need to find the least deeply nested directories
|
|
containing an SCM-related subdirectory. The following command
|
|
discovers those roots efficiently. It is efficient because it avoids
|
|
searching subdirectories inside projects whose SCM directory we
|
|
already found.
|
|
|
|
@smallexample
|
|
find repo/ \
|
|
-exec test -d @{@}/.svn \; -or \
|
|
-exec test -d @{@}/.git \; -or \
|
|
-exec test -d @{@}/CVS \; -print -prune
|
|
@end smallexample
|
|
|
|
In this example, @command{test} is used to tell if we are currently
|
|
examining a directory which appears to the a project's root directory
|
|
(because it has an SCM subdirectory). When we find a project root,
|
|
there is no need to search inside it, and @code{-prune} makes sure
|
|
that we descend no further.
|
|
|
|
For large, complex trees like the Linux kernel, this will prevent
|
|
searching a large portion of the structure, saving a good deal of
|
|
time.
|
|
|
|
|
|
@node Security Considerations
|
|
@chapter Security Considerations
|
|
|
|
Security considerations are important if you are using @code{find} or
|
|
@code{xargs} to search for or process files that don't belong to you
|
|
or which other people have control. Security considerations
|
|
relating to @code{locate} may also apply if you have files which you
|
|
do not want others to see.
|
|
|
|
The most severe forms of security problems affecting
|
|
@code{find} and related programs are when third parties bring
|
|
about a situation allowing them to do something
|
|
they would normally not be able to accomplish. This is called @emph{privilege
|
|
elevation}. This might include deleting files they would not normally
|
|
be able to delete. It is common for the operating system to periodically
|
|
invoke @code{find} for self-maintenance purposes. These invocations of
|
|
@code{find} are particularly problematic from a security point of view
|
|
as these are often invoked by the superuser and search the entire
|
|
filesystem hierarchy. Generally, the severity of any associated problem depends
|
|
on what the system is going to do with the files found by @code{find}.
|
|
|
|
@menu
|
|
* Levels of Risk:: What is your level of exposure to security problems?
|
|
* Security Considerations for find:: Security problems with find
|
|
* Security Considerations for xargs:: Security problems with xargs
|
|
* Security Considerations for locate:: Security problems with locate
|
|
* Security Summary:: That was all very complex, what does it boil down to?
|
|
* Further Reading on Security::
|
|
@end menu
|
|
|
|
|
|
@node Levels of Risk
|
|
@section Levels of Risk
|
|
|
|
There are some security risks inherent in the use of @code{find},
|
|
@code{xargs} and (to a lesser extent) @code{locate}. The severity of
|
|
these risks depends on what sort of system you are using:
|
|
|
|
@table @strong
|
|
@item High risk
|
|
Multi-user systems where you do not control (or trust) the other
|
|
users, and on which you execute @code{find}, including areas where
|
|
those other users can manipulate the filesystem (for example beneath
|
|
@file{/home} or @file{/tmp}).
|
|
|
|
@item Medium Risk
|
|
Systems where the actions of other users can create file names chosen
|
|
by them, but to which they don't have access while @code{find} is
|
|
being run. This access might include leaving programs running (shell
|
|
background jobs, @code{at} or @code{cron} tasks, for example). On
|
|
these sorts of systems, carefully written commands (avoiding use of
|
|
@samp{-print} for example) should not expose you to a high degree of
|
|
risk. Most systems fall into this category.
|
|
|
|
@item Low Risk
|
|
Systems to which untrusted parties do not have access, cannot create
|
|
file names of their own choice (even remotely) and which contain no
|
|
security flaws which might enable an untrusted third party to gain
|
|
access. Most systems do not fall into this category because there are
|
|
many ways in which external parties can affect the names of files that
|
|
are created on your system. The system on which I am writing this for
|
|
example automatically downloads software updates from the Internet;
|
|
the names of the files in which these updates exist are chosen by
|
|
third parties@footnote{Of course, I trust these parties to a large
|
|
extent anyway, because I install software provided by them; I choose
|
|
to trust them in this way, and that's a deliberate choice}.
|
|
@end table
|
|
|
|
In the discussion above, ``risk'' denotes the likelihood that someone
|
|
can cause @code{find}, @code{xargs}, @code{locate} or some other
|
|
program which is controlled by them to do something you did not
|
|
intend. The levels of risk suggested do not take any account of the
|
|
consequences of this sort of event. That is, if you operate a ``low
|
|
risk'' type system, but the consequences of a security problem are
|
|
disastrous, then you should still give serious thought to all the
|
|
possible security problems, many of which of course will not be
|
|
discussed here -- this section of the manual is intended to be
|
|
informative but not comprehensive or exhaustive.
|
|
|
|
If you are responsible for the operation of a system where the
|
|
consequences of a security problem could be very important, you should
|
|
do two things:
|
|
|
|
@enumerate
|
|
@item Define a security policy which defines who is allowed to do what
|
|
on your system.
|
|
@item Seek competent advice on how to enforce your policy, detect
|
|
breaches of that policy, and take account of any potential problems
|
|
that might fall outside the scope of your policy.
|
|
@end enumerate
|
|
|
|
|
|
@node Security Considerations for find
|
|
@section Security Considerations for @code{find}
|
|
|
|
|
|
Some of the actions @code{find} might take have a direct effect;
|
|
these include @code{-exec} and @code{-delete}. However, it is also
|
|
common to use @code{-print} explicitly or implicitly, and so if
|
|
@code{find} produces the wrong list of file names, that can also be a
|
|
security problem; consider the case for example where @code{find} is
|
|
producing a list of files to be deleted.
|
|
|
|
We normally assume that the @code{find} command line expresses the
|
|
file selection criteria and actions that the user had in mind -- that
|
|
is, the command line is ``trusted'' data.
|
|
|
|
From a security analysis point of view, the output of @code{find}
|
|
should be correct; that is, the output should contain only the names
|
|
of those files which meet the user's criteria specified on the command
|
|
line. This applies for the @code{-exec} and @code{-delete} actions;
|
|
one can consider these to be part of the output.
|
|
|
|
On the other hand, the contents of the filesystem can be manipulated
|
|
by other people, and hence we regard this as ``untrusted'' data. This
|
|
implies that the @code{find} command line is a filter which converts
|
|
the untrusted contents of the filesystem into a correct list of output
|
|
files.
|
|
|
|
The filesystem will in general change while @code{find} is searching
|
|
it; in fact, most of the potential security problems with @code{find}
|
|
relate to this issue in some way.
|
|
|
|
@dfn{Race conditions} are a general class of security problem where the
|
|
relative ordering of actions taken by @code{find} (for example) and
|
|
something else are critically important in getting the correct and expected result@footnote{This is more or less the
|
|
definition of the term ``race condition''} .
|
|
|
|
For @code{find}, an attacker might move or rename files or directories in
|
|
the hope that an action might be taken against a file which was not
|
|
normally intended to be affected. Alternatively, this sort of attack
|
|
might be intended to persuade @code{find} to search part of the
|
|
filesystem which would not normally be included in the search
|
|
(defeating the @code{-prune} action for example).
|
|
|
|
@menu
|
|
* Problems with -exec and filenames::
|
|
* Changing the Current Working Directory::
|
|
* Race Conditions with -exec::
|
|
* Race Conditions with -print and -print0::
|
|
@end menu
|
|
|
|
@node Problems with -exec and filenames
|
|
@subsection Problems with @code{-exec} and filenames
|
|
|
|
It is safe in many cases to use the @samp{-execdir} action with any
|
|
file name. Because @samp{-execdir} prefixes the arguments it passes
|
|
to programs with @samp{./}, you will not accidentally pass an argument
|
|
which is interpreted as an option. For example the file @file{-f}
|
|
would be passed to @code{rm} as @file{./-f}, which is harmless.
|
|
|
|
However, your degree of safety does depend on the nature of the
|
|
program you are running. For example constructs such as these two commands
|
|
|
|
@example
|
|
# risky
|
|
find -exec sh -c "something @{@}" \;
|
|
find -execdir sh -c "something @{@}" \;
|
|
@end example
|
|
|
|
are very dangerous. The reason for this is that the @samp{@{@}} is
|
|
expanded to a filename which might contain a semicolon or other
|
|
characters special to the shell. If for example someone creates the
|
|
file @file{/tmp/foo; rm -rf $HOME} then the two commands above could
|
|
delete someone's home directory.
|
|
|
|
So for this reason do not run any command which will pass untrusted
|
|
data (such as the names of files) to commands which interpret
|
|
arguments as commands to be further interpreted (for example
|
|
@samp{sh}).
|
|
|
|
In the case of the shell, there is a clever workaround for this
|
|
problem:
|
|
|
|
@example
|
|
# safer
|
|
find -exec sh -c 'something "$@@"' sh @{@} \;
|
|
find -execdir sh -c 'something "$@@"' sh @{@} \;
|
|
@end example
|
|
|
|
This approach is not guaranteed to avoid every problem, but it is much
|
|
safer than substituting data of an attacker's choice into the text of
|
|
a shell command.
|
|
|
|
@node Changing the Current Working Directory
|
|
@subsection Changing the Current Working Directory
|
|
|
|
As @code{find} searches the filesystem, it finds subdirectories and
|
|
then searches within them by changing its working directory. First,
|
|
@code{find} reaches and recognises a subdirectory. It then decides if that
|
|
subdirectory meets the criteria for being searched; that is, any
|
|
@samp{-xdev} or @samp{-prune} expressions are taken into account. The
|
|
@code{find} program will then change working directory and proceed to
|
|
search the directory.
|
|
|
|
A race condition attack might take the form that once the checks
|
|
relevant to @samp{-xdev} and @samp{-prune} have been done, an attacker
|
|
might rename the directory that was being considered, and put in its
|
|
place a symbolic link that actually points somewhere else.
|
|
|
|
The idea behind this attack is to fool @code{find} into going into the
|
|
wrong directory. This would leave @code{find} with a working
|
|
directory chosen by an attacker, bypassing any protection apparently
|
|
provided by @samp{-xdev} and @samp{-prune}, and any protection
|
|
provided by being able to @emph{not} list particular directories on
|
|
the @code{find} command line. This form of attack is particularly
|
|
problematic if the attacker can predict when the @code{find} command
|
|
will be run, as is the case with @code{cron} tasks for example.
|
|
|
|
GNU @code{find} has specific safeguards to prevent this general class
|
|
of problem. The exact form of these safeguards depends on the
|
|
properties of your system.
|
|
|
|
@menu
|
|
* O_NOFOLLOW:: Safely changing directory using @code{fchdir}.
|
|
* Systems without O_NOFOLLOW:: Checking for symbolic links after @code{chdir}.
|
|
@end menu
|
|
|
|
@node O_NOFOLLOW
|
|
@subsubsection @code{O_NOFOLLOW}
|
|
|
|
If your system supports the @code{O_NOFOLLOW} flag @footnote{GNU/Linux
|
|
(kernel version 2.1.126 and later) and FreeBSD (3.0-CURRENT and later)
|
|
support this} to the @code{open(2)} system call, @code{find} uses it
|
|
to safely change directories. The target directory is first opened
|
|
and then @code{find} changes working directory with the
|
|
@code{fchdir()} system call. This ensures that symbolic links are not
|
|
followed, preventing the sort of race condition attack in which use
|
|
is made of symbolic links.
|
|
|
|
If for any reason this approach does not work, @code{find} will fall
|
|
back on the method which is normally used if @code{O_NOFOLLOW} is not
|
|
supported.
|
|
|
|
You can tell if your system supports @code{O_NOFOLLOW} by running
|
|
|
|
@example
|
|
find --version
|
|
@end example
|
|
|
|
This will tell you the version number and which features are enabled.
|
|
For example, if I run this on my system now, this gives:
|
|
@example
|
|
find (GNU findutils) 4.5.11-git
|
|
Copyright (C) 2012 Free Software Foundation, Inc.
|
|
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
|
|
This is free software: you are free to change and redistribute it.
|
|
There is NO WARRANTY, to the extent permitted by law.
|
|
|
|
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
|
|
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION FTS(FTS_CWDFD) CBO(level=2)
|
|
@end example
|
|
|
|
Here, you can see that I am running a version of @code{find} which was
|
|
built from the development (git) code prior to the release of
|
|
findutils-4.5.12, and that several features including @code{O_NOFOLLOW} are
|
|
present. @code{O_NOFOLLOW} is qualified with ``enabled''. This simply means
|
|
that the current system seems to support @code{O_NOFOLLOW}. This check is
|
|
needed because it is possible to build @code{find} on a system that
|
|
defines @code{O_NOFOLLOW} and then run it on a system that ignores the
|
|
@code{O_NOFOLLOW} flag. We try to detect such cases at startup by checking
|
|
the operating system and version number; when this happens you will
|
|
see @samp{O_NOFOLLOW(disabled)} instead.
|
|
|
|
@node Systems without O_NOFOLLOW
|
|
@subsubsection Systems without @code{O_NOFOLLOW}
|
|
|
|
The strategy for preventing this type of problem on systems that lack
|
|
support for the @code{O_NOFOLLOW} flag is more complex. Each time
|
|
@code{find} changes directory, it examines the directory it is about
|
|
to move to, issues the @code{chdir()} system call, and then checks
|
|
that it has ended up in the subdirectory it expected. If all is as
|
|
expected, processing continues as normal. However, there are two main
|
|
reasons why the directory might change: the use of an automounter and
|
|
the someone removing the old directory and replacing it with something
|
|
else while @code{find} is trying to descend into it.
|
|
|
|
Where a filesystem ``automounter'' is in use it can be the case that
|
|
the use of the @code{chdir()} system call can itself cause a new
|
|
filesystem to be mounted at that point. On systems that do not
|
|
support @code{O_NOFOLLOW}, this will cause @code{find}'s security check to
|
|
fail.
|
|
|
|
However, this does not normally represent a security problem, since
|
|
the automounter configuration is normally set up by the system
|
|
administrator. Therefore, if the @code{chdir()} sanity check fails,
|
|
@code{find} will make one more attempt@footnote{This may not be the
|
|
case for the fts-based executable}. If that succeeds, execution
|
|
carries on as normal. This is the usual case for automounters.
|
|
|
|
Where an attacker is trying to exploit a race condition, the problem
|
|
may not have gone away on the second attempt. If this is the case,
|
|
@code{find} will issue a warning message and then ignore that
|
|
subdirectory. When this happens, actions such as @samp{-exec} or
|
|
@samp{-print} may already have taken place for the problematic
|
|
subdirectory. This is because @code{find} applies tests and actions
|
|
to directories before searching within them (unless @samp{-depth} was
|
|
specified).
|
|
|
|
Because of the nature of the directory-change operation and security
|
|
check, in the worst case the only things that @code{find} would have
|
|
done with the directory are to move into it and back out to the
|
|
original parent. No operations would have been performed within that
|
|
directory.
|
|
|
|
@node Race Conditions with -exec
|
|
@subsection Race Conditions with @code{-exec}
|
|
|
|
The @samp{-exec} action causes another program to be run. It passes
|
|
to the program the name of the file which is being considered at the
|
|
time. The invoked program will typically then perform some action
|
|
on that file. Once again, there is a race condition which can be
|
|
exploited here. We shall take as a specific example the command
|
|
|
|
@example
|
|
find /tmp -path /tmp/umsp/passwd -exec /bin/rm
|
|
@end example
|
|
|
|
In this simple example, we are identifying just one file to be deleted
|
|
and invoking @code{/bin/rm} to delete it. A problem exists because
|
|
there is a time gap between the point where @code{find} decides that
|
|
it needs to process the @samp{-exec} action and the point where the
|
|
@code{/bin/rm} command actually issues the @code{unlink()} system
|
|
call to delete the file from the filesystem. Within this time period, an attacker can rename the
|
|
@file{/tmp/umsp} directory, replacing it with a symbolic link to
|
|
@file{/etc}. There is no way for @code{/bin/rm} to determine that it
|
|
is working on the same file that @code{find} had in mind. Once the
|
|
symbolic link is in place, the attacker has persuaded @code{find} to
|
|
cause the deletion of the @file{/etc/passwd} file, which is not the
|
|
effect intended by the command which was actually invoked.
|
|
|
|
One possible defence against this type of attack is to modify the
|
|
behaviour of @samp{-exec} so that the @code{/bin/rm} command is run
|
|
with the argument @file{./passwd} and a suitable choice of working
|
|
directory. This would allow the normal sanity check that @code{find}
|
|
performs to protect against this form of attack too. Unfortunately,
|
|
this strategy cannot be used as the POSIX standard specifies that the
|
|
current working directory for commands invoked with @samp{-exec} must
|
|
be the same as the current working directory from which @code{find}
|
|
was invoked. This means that the @samp{-exec} action is inherently
|
|
insecure and can't be fixed.
|
|
|
|
GNU @code{find} implements a more secure variant of the @samp{-exec}
|
|
action, @samp{-execdir}. The @samp{-execdir} action
|
|
ensures that it is not necessary to dereference subdirectories to
|
|
process target files. The current directory used to invoke programs
|
|
is the same as the directory in which the file to be processed exists
|
|
(@file{/tmp/umsp} in our example, and only the basename of the file to
|
|
be processed is passed to the invoked command, with a @samp{./}
|
|
prepended (giving @file{./passwd} in our example).
|
|
|
|
The @samp{-execdir} action refuses to do anything if the current
|
|
directory is included in the @var{$PATH} environment variable. This
|
|
is necessary because @samp{-execdir} runs programs in the same
|
|
directory in which it finds files -- in general, such a directory
|
|
might be writable by untrusted users. For similar reasons,
|
|
@samp{-execdir} does not allow @samp{@{@}} to appear in the name of
|
|
the command to be run.
|
|
|
|
@node Race Conditions with -print and -print0
|
|
@subsection Race Conditions with @code{-print} and @code{-print0}
|
|
|
|
The @samp{-print} and @samp{-print0} actions can be used to produce a
|
|
list of files matching some criteria, which can then be used with some
|
|
other command, perhaps with @code{xargs}. Unfortunately, this means
|
|
that there is an unavoidable time gap between @code{find} deciding
|
|
that one or more files meet its criteria and the relevant command
|
|
being executed. For this reason, the @samp{-print} and @samp{-print0}
|
|
actions are just as insecure as @samp{-exec}.
|
|
|
|
In fact, since the construction
|
|
|
|
@example
|
|
find @dots{} -print | xargs @enddots{}
|
|
@end example
|
|
|
|
does not cope correctly with newlines or other ``white space'' in
|
|
file names, and copes poorly with file names containing quotes, the
|
|
@samp{-print} action is less secure even than @samp{-print0}.
|
|
|
|
|
|
@comment node-name, next, previous, up
|
|
@comment @node Security Considerations for xargs
|
|
@node Security Considerations for xargs
|
|
@section Security Considerations for @code{xargs}
|
|
|
|
The description of the race conditions affecting the @samp{-print}
|
|
action of @code{find} shows that @code{xargs} cannot be secure if it
|
|
is possible for an attacker to modify a filesystem after @code{find}
|
|
has started but before @code{xargs} has completed all its actions.
|
|
|
|
However, there are other security issues that exist even if it is not
|
|
possible for an attacker to have access to the filesystem in real
|
|
time. Firstly, if it is possible for an attacker to create files with
|
|
names of their choice on the filesystem, then @code{xargs} is
|
|
insecure unless the @samp{-0} option is used. If a file with the name
|
|
@file{/home/someuser/foo/bar\n/etc/passwd} exists (assume that
|
|
@samp{\n} stands for a newline character), then @code{find @dots{} -print}
|
|
can be persuaded to print three separate lines:
|
|
|
|
@example
|
|
/home/someuser/foo/bar
|
|
|
|
/etc/passwd
|
|
@end example
|
|
|
|
If it finds a blank line in the input, @code{xargs} will ignore it.
|
|
Therefore, if some action is to be taken on the basis of this list of
|
|
files, the @file{/etc/passwd} file would be included even if this was
|
|
not the intent of the person running find. There are circumstances in
|
|
which an attacker can use this to their advantage. The same
|
|
consideration applies to file names containing ordinary spaces rather
|
|
than newlines, except that of course the list of file names will no
|
|
longer contain an ``extra'' newline.
|
|
|
|
This problem is an unavoidable consequence of the default behaviour of
|
|
the @code{xargs} command, which is specified by the POSIX standard.
|
|
The only ways to avoid this problem are either to avoid all use of
|
|
@code{xargs} in favour for example of @samp{find -exec} or (where
|
|
available) @samp{find -execdir}, or to use the @samp{-0} option, which
|
|
ensures that @code{xargs} considers file names to be separated by
|
|
ASCII NUL characters rather than whitespace. However, useful as this
|
|
option is, the POSIX standard does not make it mandatory.
|
|
|
|
POSIX also specifies that @code{xargs} interprets quoting and trailing
|
|
whitespace specially in filenames, too. This means that using
|
|
@code{find ... -print | xargs ...} can cause the commands run by
|
|
@code{xargs} to receive a list of file names which is not the same as
|
|
the list printed by @code{find}. The interpretation of quotes and
|
|
trailing whitespace is turned off by the @samp{-0} argument to
|
|
@code{xargs}, which is another reason to use that option.
|
|
|
|
@comment node-name, next, previous, up
|
|
@node Security Considerations for locate
|
|
@section Security Considerations for @code{locate}
|
|
|
|
@subsection Race Conditions
|
|
It is fairly unusual for the output of @code{locate} to be fed into
|
|
another command. However, if this were to be done, this would raise
|
|
the same set of security issues as the use of @samp{find @dots{} -print}.
|
|
Although the problems relating to whitespace in file names can be
|
|
resolved by using @code{locate}'s @samp{-0} option, this still leaves
|
|
the race condition problems associated with @samp{find @dots{} -print0}.
|
|
There is no way to avoid these problems in the case of @code{locate}.
|
|
|
|
@subsection Long File Name Bugs with Old-Format Databases
|
|
Old versions of @code{locate} have a bug in the way that old-format
|
|
databases are read. This bug affects the following versions of
|
|
@code{locate}:
|
|
|
|
@enumerate
|
|
@item All releases prior to 4.2.31
|
|
@item All 4.3.x releases prior to 4.3.7
|
|
@end enumerate
|
|
|
|
The affected versions of @code{locate} read file names into a
|
|
fixed-length 1026 byte buffer, allocated on the heap. This buffer is
|
|
not extended if file names are too long to fit into the buffer. No
|
|
range checking on the length of the filename is performed. This could
|
|
in theory lead to a privilege escalation attack. Findutils versions
|
|
4.3.0 to 4.3.6 are also affected.
|
|
|
|
On systems using the old database format and affected versions of
|
|
@code{locate}, carefully-chosen long file names could in theory allow
|
|
malicious users to run code of their choice as any user invoking
|
|
locate.
|
|
|
|
If remote users can choose the names of files stored on your system,
|
|
and these files are indexed by @code{updatedb}, this may be a remote
|
|
security vulnerability. Findutils version 4.2.31 and findutils
|
|
version 4.3.7 include fixes for this problem. The @code{updatedb},
|
|
@code{bigram} and @code{code} programs do no appear to be affected.
|
|
|
|
If you are also using GNU coreutils, you can use the following command
|
|
to determine the length of the longest file name on a given system:
|
|
|
|
@example
|
|
find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
|
|
@end example
|
|
|
|
Although this problem is significant, the old database format is not
|
|
the default, and use of the old database format is not common. Most
|
|
installations and most users will not be affected by this problem.
|
|
|
|
|
|
|
|
@node Security Summary
|
|
@section Summary
|
|
|
|
Where untrusted parties can create files on the system, or affect the
|
|
names of files that are created, all uses for @code{find},
|
|
@code{locate} and @code{xargs} have known security problems except the
|
|
following:
|
|
|
|
@table @asis
|
|
@item Informational use only
|
|
Uses where the programs are used to prepare lists of file names upon
|
|
which no further action will ever be taken.
|
|
|
|
@item @samp{-delete}
|
|
Use of the @samp{-delete} action with @code{find} to delete files
|
|
which meet specified criteria
|
|
|
|
@item @samp{-execdir}
|
|
Use of the @samp{-execdir} action with @code{find} where the
|
|
@env{PATH} environment variable contains directories which contain
|
|
only trusted programs.
|
|
@end table
|
|
|
|
|
|
@node Further Reading on Security
|
|
@section Further Reading on Security
|
|
|
|
While there are a number of books on computer security, there are also
|
|
useful articles on the web that touch on the issues described above:
|
|
|
|
@table @url
|
|
@item http://goo.gl/DAvh
|
|
@c https://www.securecoding.cert.org/confluence/display/seccode/MSC09-C.+Character+Encoding+-+Use+Subset+of+ASCII+for+Safety
|
|
This article describes some of the unfortunate effects of allowing
|
|
free choice of file names.
|
|
@item http://cwe.mitre.org/data/definitions/78.html
|
|
Describes OS Command Injection
|
|
@item https://cwe.mitre.org/data/definitions/73.html
|
|
Describes problems arising from allowing remote computers to send
|
|
requests which specify file names of their choice
|
|
@item http://cwe.mitre.org/data/definitions/116.html
|
|
Describes problems relating to encoding file names and escaping
|
|
characters. This article is relevant to findutils because for command
|
|
lines processed via the shell, the encoding and escaping rules are
|
|
already set by the shell. For example command lines like @code{find
|
|
... -print | some-shell-script} require specific care.
|
|
@item http://xkcd.com/327/
|
|
A humorous and pithy summary of the broader problem.
|
|
@end table
|
|
|
|
@comment node-name, next, previous, up
|
|
@node Error Messages
|
|
@chapter Error Messages
|
|
|
|
This section describes some of the error messages sometimes made by
|
|
@code{find}, @code{xargs}, or @code{locate}, explains them and in some
|
|
cases provides advice as to what you should do about this.
|
|
|
|
This manual is written in English. The GNU findutils software
|
|
features translations of error messages for many languages. For this
|
|
reason the error messages produced by the programs are made to be as
|
|
self-explanatory as possible. This approach avoids leaving people to
|
|
figure out which test an English-language error message corresponds
|
|
to. Error messages which are self-explanatory will not normally be
|
|
mentioned in this document. For those messages mentioned in this
|
|
document, only the English-language version of the message will be
|
|
listed.
|
|
|
|
@menu
|
|
* Error Messages From find::
|
|
* Error Messages From xargs::
|
|
* Error Messages From locate::
|
|
* Error Messages From updatedb::
|
|
@end menu
|
|
|
|
@node Error Messages From find
|
|
@section Error Messages From @code{find}
|
|
|
|
Most error messages produced by find are self-explanatory. Error
|
|
messages sometimes include a filename. When this happens, the
|
|
filename is quoted in order to prevent any unusual characters in the
|
|
filename making unwanted changes in the state of the terminal.
|
|
|
|
@table @samp
|
|
@item invalid predicate `-foo'
|
|
This means that the @code{find} command line included something that
|
|
started with a dash or other special character. The @code{find}
|
|
program tried to interpret this as a test, action or option, but
|
|
didn't recognise it. If it was intended to be a test, check what was
|
|
specified against the documentation. If, on the other hand, the
|
|
string is the name of a file which has been expanded from a wildcard
|
|
(for example because you have a @samp{*} on the command line),
|
|
consider using @samp{./*} or just @samp{.} instead.
|
|
|
|
@item unexpected extra predicate
|
|
This usually happens if you have an extra bracket on the command line
|
|
(for example @samp{find . -print \)}).
|
|
|
|
@item Warning: filesystem /path/foo has recently been mounted
|
|
@itemx Warning: filesystem /path/foo has recently been unmounted
|
|
These messages might appear when @code{find} moves into a directory
|
|
and finds that the device number and inode are different from what it
|
|
expected them to be. If the directory @code{find} has moved into is
|
|
on a network filesystem (NFS), it will not issue this message, because
|
|
@code{automount} frequently mounts new filesystems on directories as
|
|
you move into them (that is how it knows you want to use the
|
|
filesystem). So, if you do see this message, be wary --
|
|
@code{automount} may not have been responsible. Consider the
|
|
possibility that someone else is manipulating the filesystem while
|
|
@code{find} is running. Some people might do this in order to mislead
|
|
@code{find} or persuade it to look at one set of files when it thought
|
|
it was looking at another set.
|
|
|
|
@item /path/foo changed during execution of find (old device number 12345, new device number 6789, filesystem type is <whatever>) [ref XXX]
|
|
This message is issued when @code{find} moves into a directory and ends up
|
|
somewhere it didn't expect to be. This happens in one of two
|
|
circumstances. Firstly, this happens when @code{automount} intervenes
|
|
on a system where @code{find} doesn't know how to determine what
|
|
the current set of mounted filesystems is.
|
|
|
|
Secondly, this can happen when the device number of a directory
|
|
appears to change during a change of current directory, but
|
|
@code{find} is moving up the filesystem hierarchy rather than down into it.
|
|
In order to prevent @code{find} wandering off into some unexpected
|
|
part of the filesystem, we stop it at this point.
|
|
|
|
@item Don't know how to use getmntent() to read `/etc/mtab'. This is a bug.
|
|
This message is issued when a problem similar to the above occurs on a
|
|
system where @code{find} doesn't know how to figure out the current
|
|
list of mount points. Ask for help on @email{bug-findutils@@gnu.org}.
|
|
|
|
@item /path/foo/bar changed during execution of find (old inode number 12345, new inode number 67893, filesystem type is <whatever>) [ref XXX]"),
|
|
This message is issued when @code{find} moves into a directory and
|
|
discovers that the inode number of that directory
|
|
is different from the inode number that it obtained when it examined the
|
|
directory previously. This usually means that while
|
|
@code{find} was deep in a directory hierarchy doing a
|
|
time consuming operation, somebody has moved one of the parent directories to
|
|
another location in the same filesystem. This may or may not have been done
|
|
maliciously. In any case, @code{find} stops at this point
|
|
to avoid traversing parts of the filesystem that it wasn't
|
|
intended to. You can use @code{ls -li} or @code{find /path -inum
|
|
12345 -o -inum 67893} to find out more about what has happened.
|
|
|
|
@item sanity check of the fnmatch() library function failed.
|
|
Please submit a bug report. You may well be asked questions about
|
|
your system, and if you compiled the @code{findutils} code yourself,
|
|
you should keep your copy of the build tree around. The likely
|
|
explanation is that your system has a buggy implementation of
|
|
@code{fnmatch} that looks enough like the GNU version to fool
|
|
@code{configure}, but which doesn't work properly.
|
|
|
|
@item cannot fork
|
|
This normally happens if you use the @code{-exec} action or
|
|
something similar (@code{-ok} and so forth) but the system has run out
|
|
of free process slots. This is either because the system is very busy
|
|
and the system has reached its maximum process limit, or because you
|
|
have a resource limit in place and you've reached it. Check the
|
|
system for runaway processes (with @code{ps}, if possible). Some process
|
|
slots are normally reserved for use by @samp{root}.
|
|
|
|
@item some-program terminated by signal 99
|
|
Some program which was launched with @code{-exec} or similar was killed
|
|
with a fatal signal. This is just an advisory message.
|
|
@end table
|
|
|
|
|
|
@node Error Messages From xargs
|
|
@section Error Messages From @code{xargs}
|
|
|
|
@table @samp
|
|
@item environment is too large for exec
|
|
This message means that you have so many environment variables set (or
|
|
such large values for them) that there is no room within the
|
|
system-imposed limits on program command line argument length to
|
|
invoke any program. This is an unlikely situation and is more likely
|
|
result of an attempt to test the limits of @code{xargs}, or break it.
|
|
Please try unsetting some environment variables, or exiting the
|
|
current shell. You can also use @samp{xargs --show-limits} to
|
|
understand the relevant sizes.
|
|
|
|
@item argument list too long
|
|
You are using the @samp{-I} option and @code{xargs} doesn't have
|
|
enough space to build a command line because it has read a really
|
|
large item and it doesn't fit. You may be able to work around this
|
|
problem with the @samp{-s} option, but the default size is pretty
|
|
large. This is a rare situation and is more likely an attempt to test
|
|
the limits of @code{xargs}, or break it. Otherwise, you will need to
|
|
try to shorten the problematic argument or not use @code{xargs}.
|
|
|
|
@item argument line too long
|
|
You are using the @samp{-L} or @samp{-l} option and one of the input
|
|
lines is too long. You may be able to work around this problem with
|
|
the @samp{-s} option, but the default size is pretty large. If you
|
|
can modify the your @code{xargs} command not to use @samp{-L} or
|
|
@samp{-l}, that will be more likely to result in success.
|
|
|
|
@item cannot fork
|
|
See the description of the similar message for @code{find}.
|
|
|
|
@item <program>: exited with status 255; aborting
|
|
When a command run by @code{xargs} exits with status 255, @code{xargs}
|
|
is supposed to stop. If this is not what you intended, wrap the
|
|
program you are trying to invoke in a shell script which doesn't
|
|
return status 255.
|
|
|
|
@item <program>: terminated by signal 99
|
|
See the description of the similar message for @code{find}.
|
|
|
|
@item cannot set SIGUSR1 signal handler
|
|
@code{xargs} is having trouble preparing for you to be able to send it
|
|
signals to increase or decrease the parallelism of its processing.
|
|
If you don't plan to send it those signals, this warning can be ignored
|
|
(though if you're a programmer, you may want to help us figure out
|
|
why @code{xargs} is confused by your operating system).
|
|
@end table
|
|
|
|
@node Error Messages From locate
|
|
@section Error Messages From @code{locate}
|
|
|
|
@table @samp
|
|
@item warning: database @file{@value{LOCATE_DB}} is more than 8 days old
|
|
The @code{locate} program relies on a database which is periodically
|
|
built by the @code{updatedb} program. That hasn't happened in a long
|
|
time. To fix this problem, run @code{updatedb} manually. This can
|
|
often happen on systems that are generally not left on, so the
|
|
periodic ``cron'' task which normally does this doesn't get a chance
|
|
to run.
|
|
|
|
@item locate database @file{@value{LOCATE_DB}} is corrupt or invalid
|
|
This should not happen. Re-run @code{updatedb}. If that works, but
|
|
@code{locate} still produces this error, run @code{locate --version}
|
|
and @code{updatedb --version}. These should produce the same output.
|
|
If not, you are using a mixed toolset; check your @samp{$PATH}
|
|
environment variable and your shell aliases (if you have any). If
|
|
both programs claim to be GNU versions, this is a bug; all versions of
|
|
these programs should interoperate without problem. Ask for help on
|
|
@email{bug-findutils@@gnu.org}.
|
|
@end table
|
|
|
|
|
|
@node Error Messages From updatedb
|
|
@section Error Messages From @code{updatedb}
|
|
|
|
The @code{updatedb} program (and the programs it invokes) do issue
|
|
error messages, but none seem to be candidates for guidance. If
|
|
you are having a problem understanding one of these, ask for help on
|
|
@email{bug-findutils@@gnu.org}.
|
|
|
|
@node GNU Free Documentation License
|
|
@appendix GNU Free Documentation License
|
|
@include fdl.texi
|
|
|
|
@node Primary Index
|
|
@unnumbered @code{find} Primary Index
|
|
|
|
This is a list of all of the primaries (tests, actions, and options)
|
|
that make up @code{find} expressions for selecting files. @xref{find
|
|
Expressions}, for more information on expressions.
|
|
|
|
@printindex fn
|
|
|
|
@bye
|
|
|
|
@comment texi related words used by Emacs' spell checker ispell.el
|
|
|
|
@comment LocalWords: texinfo setfilename settitle setchapternewpage
|
|
@comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
|
|
@comment LocalWords: filll dir samp dfn noindent xref pxref
|
|
@comment LocalWords: var deffn texi deffnx itemx emph asis
|
|
@comment LocalWords: findex smallexample subsubsection cindex
|
|
@comment LocalWords: dircategory direntry itemize
|
|
|
|
@comment other words used by Emacs' spell checker ispell.el
|
|
@comment LocalWords: README fred updatedb xargs Plett Rendell akefile
|
|
@comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
|
|
@comment LocalWords: ipath regex iregex expr fubar regexps
|
|
@comment LocalWords: metacharacters macs sr sc inode lname ilname
|
|
@comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
|
|
@comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
|
|
@comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
|
|
@comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
|
|
@comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
|
|
@comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
|
|
@comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
|
|
@comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
|
|
@comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
|
|
@comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof
|
|
@comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
|
|
@comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
|
|
@comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
|
|
@comment LocalWords: ois ok Pinard printindex proc procs prunefs
|
|
@comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
|
|
@comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
|
|
@comment LocalWords: wildcard zlogout basename execdir wholename iwholename
|
|
@comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX
|