mirror of
https://https.git.savannah.gnu.org/git/sed.git
synced 2026-01-26 08:07:53 +00:00
First, run this command: git grep -l ' $'|xargs perl -pi -e 's/[ \t]+$//' Then some minor fix-up to make the two newly-failing tests pass once again: * testsuite/mac-mf.sed: Append this, s/ $//, to eliminate trailing spaces in the actual output. * testsuite/y-newline.good: Manually remove a trailing space between two concatenated prompts.
134 lines
5.8 KiB
Plaintext
134 lines
5.8 KiB
Plaintext
* ABOUT BUGS
|
|
|
|
Before reporting a bug, please check the list of known bugs
|
|
and the list of oft-reported non-bugs (below).
|
|
|
|
Bugs and comments may be sent to bonzini@gnu.org; please
|
|
include in the Subject: header the first line of the output of
|
|
``sed --version''.
|
|
|
|
Please do not send a bug report like this:
|
|
|
|
[while building frobme-1.3.4]
|
|
$ configure
|
|
sed: file sedscr line 1: Unknown option to 's'
|
|
|
|
If sed doesn't configure your favorite package, take a few extra
|
|
minutes to identify the specific problem and make a stand-alone test
|
|
case.
|
|
|
|
A stand-alone test case includes all the data necessary to perform the
|
|
test, and the specific invocation of sed that causes the problem. The
|
|
smaller a stand-alone test case is, the better. A test case should
|
|
not involve something as far removed from sed as ``try to configure
|
|
frobme-1.3.4''. Yes, that is in principle enough information to look
|
|
for the bug, but that is not a very practical prospect.
|
|
|
|
|
|
|
|
* NON-BUGS
|
|
|
|
`N' command on the last line
|
|
|
|
Most versions of sed exit without printing anything when the `N'
|
|
command is issued on the last line of a file. GNU sed instead
|
|
prints pattern space before exiting unless of course the `-n'
|
|
command switch has been specified. More information on the reason
|
|
behind this choice can be found in the Info manual.
|
|
|
|
|
|
regex syntax clashes (problems with backslashes)
|
|
|
|
sed uses the Posix basic regular expression syntax. According to
|
|
the standard, the meaning of some escape sequences is undefined in
|
|
this syntax; notable in the case of GNU sed are `\|', `\+', `\?',
|
|
`\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.
|
|
|
|
As in all GNU programs that use Posix basic regular expressions, sed
|
|
interprets these escape sequences as meta-characters. So, `x\+'
|
|
matches one or more occurrences of `x'. `abc\|def' matches either
|
|
`abc' or `def'.
|
|
|
|
This syntax may cause problems when running scripts written for other
|
|
seds. Some sed programs have been written with the assumption that
|
|
`\|' and `\+' match the literal characters `|' and `+'. Such scripts
|
|
must be modified by removing the spurious backslashes if they are to
|
|
be used with recent versions of sed (not only GNU sed).
|
|
|
|
On the other hand, some scripts use `s|abc\|def||g' to remove occurrences
|
|
of _either_ `abc' or `def'. While this worked until sed 4.0.x, newer
|
|
versions interpret this as removing the string `abc|def'. This is
|
|
again undefined behavior according to POSIX, but this interpretation
|
|
is arguably more robust: the older one, for example, required that
|
|
the regex matcher parsed `\/' as `/' in the common case of escaping
|
|
a slash, which is again undefined behavior; the new behavior avoids
|
|
this, and this is good because the regex matcher is only partially
|
|
under our control.
|
|
|
|
In addition, GNU sed supports several escape characters (some of
|
|
which are multi-character) to insert non-printable characters
|
|
in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x'). These
|
|
can cause similar problems with scripts written for other seds.
|
|
|
|
|
|
-i clobbers read-only files
|
|
|
|
In short, `sed d -i' will let one delete the contents of
|
|
a read-only file, and in general the `-i' option will let
|
|
one clobber protected files. This is not a bug, but rather a
|
|
consequence of how the Unix filesystem works.
|
|
|
|
The permissions on a file say what can happen to the data
|
|
in that file, while the permissions on a directory say what can
|
|
happen to the list of files in that directory. `sed -i'
|
|
will not ever open for writing a file that is already on disk,
|
|
rather, it will work on a temporary file that is finally renamed
|
|
to the original name: if you rename or delete files, you're actually
|
|
modifying the contents of the directory, so the operation depends on
|
|
the permissions of the directory, not of the file). For this same
|
|
reason, sed will not let one use `-i' on a writeable file in a
|
|
read-only directory, and will break hard or symbolic links when
|
|
`-i' is used on such a file.
|
|
|
|
|
|
`0a' does not work (gives an error)
|
|
|
|
There is no line 0. 0 is a special address that is only used to treat
|
|
addresses like `0,/RE/' as active when the script starts: if you
|
|
write `1,/abc/d' and the first line includes the word `abc', then
|
|
that match would be ignored because address ranges must span at least
|
|
two lines (barring the end of the file); but what you probably wanted is
|
|
to delete every line up to the first one including `abc', and this
|
|
is obtained with `0,/abc/d'.
|
|
|
|
|
|
`[a-z]' is case insensitive
|
|
`s/.*//' does not clear pattern space
|
|
|
|
You are encountering problems with locales. POSIX mandates that `[a-z]'
|
|
uses the current locale's collation order -- in C parlance, that means
|
|
strcoll(3) instead of strcmp(3). Some locales have a case insensitive
|
|
strcoll, others don't.
|
|
|
|
Another problem is that [a-z] tries to use collation symbols. This
|
|
only happens if you are on the GNU system, using GNU libc's regular
|
|
expression matcher instead of compiling the one supplied with GNU sed.
|
|
In a Danish locale, for example, the regular expression `^[a-z]$'
|
|
matches the string `aa', because `aa' is a single collating symbol that
|
|
comes after `a' and before `b'; `ll' behaves similarly in Spanish
|
|
locales, or `ij' in Dutch locales.
|
|
|
|
Another common localization-related problem happens if your input stream
|
|
includes invalid multibyte sequences. POSIX mandates that such
|
|
sequences are _not_ matched by `.', so that `s/.*//' will not clear
|
|
pattern space as you would expect. In fact, there is no way to clear
|
|
sed's buffers in the middle of the script in most multibyte locales
|
|
(including UTF-8 locales). For this reason, GNU sed provides a `z'
|
|
command (for `zap') as an extension.
|
|
|
|
However, to work around both of these problems, which may cause bugs
|
|
in shell scripts, you can set the LC_ALL environment variable to `C',
|
|
or set the locale on a more fine-grained basis with the other LC_*
|
|
environment variables.
|
|
|