1. Manual page has been updated to reflect previous change to tar.
2. Error messages are more detailed & secured against unterminated
header strings. Previously, sanitize() assumed numeric fields were
terminated by space or null, but did not check termination is
actually present.
3. Default mode of the blank header in archive() is now 0600. For
all file headers, this is immediately set to the file's mode. Only
L-headers keep the 0600 value (which are not user visible). GNU
tar uses 0644, but 0600 was chosen for symmetry with sbase
unarchive(), which also uses 0600 as the initial mode.
1. A fix for a bug in the unarchive() function (causing metadata
loss when used on large tar files). This bug is due to existing
code continuing to check for h->type == HARDLINK and SOFTLINK
(near end of function), when the entire header block has already
been overwritten by a call to eread() prior to the h->type checks.
2. Long (>=256 byte) file name compatibility: 'L' style long file
names were extremely simple to add, requiring a (net) addition of
a handful lines of code, and this patch supports both extracting
and creating L-style tar archives. Pax 'x' style long names are
more complex to parse, and this patch only supports extracting pax
'x' tars, but not creating them.
3. Command line argument compatibility improvements: 'c', 'x', 't'
args are accepted without needing a hyphen in front. The '-p'
flag is also accepted but is no-op (as it is the normal behaviour
of sbase tar anyway, and allows sbase tar to be used in scripts
specifying this flag). Directory tree member extraction is also
supported by this patch.
4. Handle tar archives with "." (current directory) entries. Some
archives contain "." or "./" entries, causing error reports when
the current code tries to remove() the current dir. I have added a
check in unarchive() to not perform remove() on encountering these
entries.
As requested, I resend my old patch for fixing the crashing while
archiving with names longer than 100 characters.
Last patch dealing with the issue was [1], and the old patch was [2]. The
code before this commit was not dealing correctly with multiple slashes,
but use of basename(3) and dirname(3) needed a temporary buffer because
otherwise we destroyed the path that was used later in several places.
This solution does not modifies the path and use pointer arithmetic to
solve the problem.
[1] https://lists.suckless.org/hackers/2412/19213.html
[2] https://lists.suckless.org/hackers/2402/19071.html
Co-authored-by: Roberto E. Vargas Caballer <k0ga@shike2.net>
Some tar archives (eg. ftp://ftp.gnu.org/gnu/shtool/shtool-2.0.8.tar.gz)
use leading spaces instead of leading zeroes for numeric fields.
Although it is not allowed by the ustar specification, most tar
implementations recognize it as correct. But since 3ef6d4e4, we
replace all spaces by NULs here, not just trailing ones, which leads to
recognizing such archives as malformed. This fixes it: we now skip
over leading spaces, allowing strtol(3) to read those numeric fields.
Since musl 1.1.23, it too does not provide `major` and `minor` through
sys/types.h, so instead include sys/sysmacros.h based on the absence of
`major` rather than only on glibc.
Thanks to Rich Felker for the suggestion.
On glibc, major, minor, and makedev are all defined in
sys/sysmacros.h with types.h only including this for historical
reasons. A future release of glibc will remove this behaviour,
meaning that major, minor, and makedev will no longer be defined
for us without including sysmacros.h.
Previously, with -p, the specified directory and all of its parents
would be 0777&~filemask (regardless of the -m flag). POSIX says parent
directories must created as (0300|~filemask)&0777, and of course if -m
is set, the specified directory should be created with those
permissions.
Additionally, POSIX says that for symbolic_mode strings, + and - should
be interpretted relative to a default mode of 0777 (not 0).
Without -p, previously the directory would be created first with
0777&~filemask (before a chmod), but POSIX says that the directory shall
at no point in time have permissions less restrictive than the -m mode
argument.
Rather than dealing with mkdir removing the filemask bits by calling
chmod afterward, just clear the umask and remove the bits manually.
In the description of 3111908b034c73673a2f079b2b13a88c18379baa, it says
that the functions must be able to handle st being NULL, but recurse
always passes a valid pointer. The only function that was ever passed
NULL was rm(), but this was changed to go through recurse in
2f4ab527391135e651b256f8654b050ea4a48f3d, so now the checks are
pointless.
by re-ordering when chmod/chown is done, only a list of directories (not
all files) need be kept for fixing mtime.
this also fixes an issue where set-user-id files in a tar may not work. chmod
is done before chown and before the file is written. if ownership changes, or
the file is being written as a normal user, the setuid bit would be cleared.
also fixes ownership of symbolic links. previously a chown() was called,
which would change the ownership of the link target. lchown() is now
used for symbolic links.
renamed all ent, ent* functions to dir* as it better describes what they
do.
use timespec/utimensat instead of timeval/utimes to get AT_SYMLINK_NOFOLLOW
Rely on what the system provides. These are not standardized macros
but any relevant UNIX system will provide them.
We can revisit this in the future if something breaks.
When we selectively process entries from the archive, ensure that
we jump over the data section of each uninteresting entry before going
on to process the next entry. Not doing so, leaves the file stream
pointer in the wrong place.