Commit Graph

60 Commits

Author SHA1 Message Date
Lasse Collin 3c7860cf49 xzdiff: Add support for .lz files.
The other scripts don't need changes for .lz support because
in those scripts it is enough that xz supports .lz.
2022-11-11 13:16:21 +02:00
Lasse Collin d76c752a6d Scripts: Ignore warnings from xz.
In practice this means making the scripts work when
the input files have an unsupported check type which
isn't a problem in practice unless support for
some check types has been disabled at build time.
2022-11-11 12:23:58 +02:00
Lasse Collin 097c7b67ce xzgrep: Fix compatibility with old shells.
Running the current xzgrep on Slackware 10.1 with GNU bash 3.00.15:

    xzgrep: line 231: syntax error near unexpected token `;;'

On SCO OpenServer 5.0.7 with Korn Shell 93r:

    syntax error at line 231 : `;;' unexpected

Turns out that some old shells don't like apostrophes (') inside
command substitutions. For example, the following fails:

    x=$(echo foo
    # asdf'zxcv
    echo bar)
    printf '%s\n' "$x"

The problem was introduced by commits
69d1b3fc29 (2022-03-29),
bd7b290f3f (2022-07-18), and
a648978b20 (2022-07-19).
5.2.6 is the only stable release that included
this problem.

Thanks to Kevin R. Bulgrien for reporting the problem
on SCO OpenServer 5.0.7 and for providing the fix.
2022-09-16 14:07:03 +03:00
Lasse Collin d796b6d7fd xzgrep man page: Document exit statuses. 2022-07-19 23:19:49 +03:00
Lasse Collin 923bf96b55 xzgrep: Improve error handling, especially signals.
xzgrep wouldn't exit on SIGPIPE or SIGQUIT when it clearly
should have. It's quite possible that it's not perfect still
but at least it's much better.

If multiple exit statuses compete, now it tries to pick
the largest of value.

Some comments were added.

The exit status handling of signals is still broken if the shell
uses values larger than 255 in $? to indicate that a process
died due to a signal ***and*** their "exit" command doesn't take
this into account. This seems to work well with the ksh and yash
versions I tried. However, there is a report in gzip/zgrep that
OpenSolaris 5.11 (not 5.10) has a problem with "exit" truncating
the argument to 8 bits:

    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22900#25

Such a bug would break xzgrep but I didn't add a workaround
at least for now. 5.11 is old and I don't know if the problem
exists in modern descendants, or if the problem exists in other
ksh implementations in use.
2022-07-19 23:13:24 +03:00
Lasse Collin a648978b20 xzgrep: Make the fix for ZDI-CAN-16587 more robust.
I don't know if this can make a difference in the real world
but it looked kind of suspicious (what happens with sed
implementations that cannot process very long lines?).
At least this commit shouldn't make it worse.
2022-07-19 00:10:55 +03:00
Lasse Collin bd7b290f3f xzgrep: Use grep -H --label when available (GNU, *BSDs).
It avoids the use of sed for prefixing filenames to output lines.
Using sed for that is slower and prone to security bugs so now
the sed method is only used as a fallback.

This also fixes an actual bug: When grepping a binary file,
GNU grep nowadays prints its diagnostics to stderr instead of
stdout and thus the sed-method for prefixing the filename doesn't
work. So with this commit grepping binary files gives reasonable
output with GNU grep now.

This was inspired by zgrep but the implementation is different.
2022-07-18 22:06:10 +03:00
Lasse Collin b56729af9f xzgrep: Use -e to specify the pattern to grep.
Now we don't need the separate test for adding the -q option
as it can be added directly in the two places where it's needed.
2022-07-18 21:10:25 +03:00
Lasse Collin bad61b5997 Scripts: Use printf instead of echo in a few places.
It's a good habbit as echo has some portability corner cases
when the string contents can be anything.
2022-07-18 19:18:48 +03:00
Lasse Collin 6a4a4a7d26 xzgrep: Add more LC_ALL=C to avoid bugs with multibyte characters.
Also replace one use of expr with printf.

The rationale for LC_ALL=C was already mentioned in
69d1b3fc29 that fixed a security
issue. However, unrelated uses weren't changed in that commit yet.

POSIX says that with sed and such tools one should use LC_ALL=C
to ensure predictable behavior when strings contain byte sequences
that aren't valid multibyte characters in the current locale. See
under "Application usage" in here:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html

With GNU sed invalid multibyte strings would work without this;
it's documented in its Texinfo manual. Some other implementations
aren't so forgiving.
2022-07-17 21:36:25 +03:00
Lasse Collin b48f9d615f xzgrep: Fix parsing of certain options.
Fix handling of "xzgrep -25 foo" (in GNU grep "grep -25 foo" is
an alias for "grep -C25 foo"). xzgrep would treat "foo" as filename
instead of as a pattern. This bug was fixed in zgrep in gzip in 2012.

Add -E, -F, -G, and -P to the "no argument required" list.

Add -X to "argument required" list. It is an
intentionally-undocumented GNU grep option so this isn't
an important option for xzgrep but it seems that other grep
implementations (well, those that I checked) don't support -X
so I hope this change is an improvement still.

grep -d (grep --directories=ACTION) requires an argument. In
contrast to zgrep, I kept -d in the "no argument required" list
because it's not supported in xzgrep (or zgrep). This way
"xzgrep -d" gives an error about option being unsupported instead
of telling that it requires an argument. Both zgrep and xzgrep
tell that it's unsupported if an argument is specified.

Add comments.
2022-07-17 20:57:06 +03:00
Lasse Collin 69d1b3fc29 xzgrep: Fix escaping of malicious filenames (ZDI-CAN-16587).
Malicious filenames can make xzgrep to write to arbitrary files
or (with a GNU sed extension) lead to arbitrary code execution.

xzgrep from XZ Utils versions up to and including 5.2.5 are
affected. 5.3.1alpha and 5.3.2alpha are affected as well.
This patch works for all of them.

This bug was inherited from gzip's zgrep. gzip 1.12 includes
a fix for zgrep.

The issue with the old sed script is that with multiple newlines,
the N-command will read the second line of input, then the
s-commands will be skipped because it's not the end of the
file yet, then a new sed cycle starts and the pattern space
is printed and emptied. So only the last line or two get escaped.

One way to fix this would be to read all lines into the pattern
space first. However, the included fix is even simpler: All lines
except the last line get a backslash appended at the end. To ensure
that shell command substitution doesn't eat a possible trailing
newline, a colon is appended to the filename before escaping.
The colon is later used to separate the filename from the grep
output so it is fine to add it here instead of a few lines later.

The old code also wasn't POSIX compliant as it used \n in the
replacement section of the s-command. Using \<newline> is the
POSIX compatible method.

LC_ALL=C was added to the two critical sed commands. POSIX sed
manual recommends it when using sed to manipulate pathnames
because in other locales invalid multibyte sequences might
cause issues with some sed implementations. In case of GNU sed,
these particular sed scripts wouldn't have such problems but some
other scripts could have, see:

    info '(sed)Locale Considerations'

This vulnerability was discovered by:
cleemy desu wayo working with Trend Micro Zero Day Initiative

Thanks to Jim Meyering and Paul Eggert discussing the different
ways to fix this and for coordinating the patch release schedule
with gzip.
2022-03-29 20:10:50 +03:00
Lasse Collin 2024fbf279 xzgrep: Update man page timestamp. 2021-11-13 21:04:05 +02:00
Ville Skyttä 3a512c7787 xzgrep: use `grep -E/-F` instead of `egrep` and `fgrep`
`egrep` and `fgrep` have been deprecated in GNU grep since 2007, and in
current post 3.7 Git they have been made to emit obsolescence warnings:
https://git.savannah.gnu.org/cgit/grep.git/commit/?id=a9515624709865d480e3142fd959bccd1c9372d1
2021-11-13 18:17:33 +02:00
Lasse Collin 3247e95115 xzdiff: Update the man page about the exit status.
This was forgotten from 194029ffaf.
2021-06-04 19:02:38 +03:00
Lasse Collin 96f5a28a46 xzless: Fix less(1) version detection when it contains a dot.
Sometimes the version number from "less -V" contains a dot,
sometimes not. xzless failed detect the version number when
it does contain a dot. This fixes it.

Thanks to nick87720z for reporting this. Apparently it had been
reported here <https://bugs.gentoo.org/489362> in 2013.
2021-06-04 18:52:48 +03:00
Lasse Collin 9cdabbeea8 Scripts: Add zstd support to xzdiff. 2021-01-11 23:57:11 +02:00
Lasse Collin 73c555b307 Scripts: Fix exit status of xzgrep.
Omit the -q option from xz, gzip, and bzip2. With xz this shouldn't
matter. With gzip it's important because -q makes gzip replace SIGPIPE
with exit status 2. With bzip2 it's important because with -q bzip2
is completely silent if input is corrupt while other decompressors
still give an error message.

Avoiding exit status 2 from gzip is important because bzip2 uses
exit status 2 to indicate corrupt input. Before this commit xzgrep
didn't recognize corrupt .bz2 files because xzgrep was treating
exit status 2 as SIGPIPE for gzip compatibility.

zstd still needs -q because otherwise it is noisy in normal
operation.

The code to detect real SIGPIPE didn't check if the exit status
was due to a signal (>= 128) and so could ignore some other exit
status too.
2021-01-11 23:28:52 +02:00
Lasse Collin 194029ffaf Scripts: Fix exit status of xzdiff/xzcmp.
This is a minor fix since this affects only the situation when
the files differ and the exit status is something else than 0.
In such case there could be SIGPIPE from a decompression tool
and that would result in exit status of 2 from xzdiff/xzcmp
while the correct behavior would be to return 1 or whatever
else diff or cmp may have returned.

This commit omits the -q option from xz/gzip/bzip2/lzop arguments.
I'm not sure why the -q was used in the first place, perhaps it
hides warnings in some situation that I cannot see at the moment.
Hopefully the removal won't introduce a new bug.

With gzip the -q option was harmful because it made gzip return 2
instead of >= 128 with SIGPIPE. Ignoring exit status 2 (warning
from gzip) isn't practical because bzip2 uses exit status 2 to
indicate corrupt input file. It's better if SIGPIPE results in
exit status >= 128.

With bzip2 the removal of -q seems to be good because with -q
it prints nothing if input is corrupt. The other tools aren't
silent in this situation even with -q. On the other hand, if
zstd support is added, it will need -q since otherwise it's
noisy in normal situations.

Thanks to Étienne Mollier and Sebastian Andrzej Siewior.
2021-01-11 22:58:58 +02:00
Adam Borowski 1890351f34 Scripts: Add zstd support to xzgrep.
Thanks to Adam Borowski.
2020-12-05 22:39:03 +02:00
Lasse Collin a9e2a87f1d src/scripts/xzgrep.1: Filenames to xzgrep are optional.
xzgrep --help was correct already.
2020-04-06 19:34:48 +03:00
Bjarni Ingi Gislason a7ba275d9b src/script/xzgrep.1: Remove superfluous '.RB'
Output is from: test-groff -b -e -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

Input file is ./src/scripts/xzgrep.1

<src/scripts/xzgrep.1>:20 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:23 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:26 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:29 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:32 (macro RB): only 1 argument, but more are expected

 "abc..." does not mean the same as "abc ...".

  The output from nroff and troff is unchanged except for the space
between "file" and "...".

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-04-06 19:29:15 +03:00
Bjarni Ingi Gislason 133d498db0 xzgrep.1: Delete superfluous '.PP'
Summary:

mandoc -T lint xzgrep.1 :
mandoc: xzgrep.1:79:2: WARNING: skipping paragraph macro: PP empty

  There is no change in the output of "nroff" and "troff".

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-04-06 19:08:14 +03:00
Lasse Collin 6f7211b6bb Build: Add support for translated man pages using po4a.
The dependency on po4a is optional. It's never required to install
the translated man pages when xz is built from a release tarball.
If po4a is missing when building from xz.git, the translated man
pages won't be generated but otherwise the build will work normally.

The translations are only updated automatically by autogen.sh and
by "make mydist". This makes it easy to keep po4a as an optional
dependency and ensures that I won't forget to put updated
translations to a release tarball.

The translated man pages aren't installed if --disable-nls is used.

The installation of translated man pages abuses Automake internals
by calling "install-man" with redefined dist_man_MANS and man_MANS.
This makes the hairy script code slightly less hairy. If it breaks
some day, this code needs to be fixed; don't blame Automake developers.

Also, this adds more quotes to the existing shell script code in
the Makefile.am "-hook"s.
2020-02-07 15:32:21 +02:00
Lasse Collin 43ce4ea7c7 Scripts: Put /usr/xpg4/bin to the beginning of PATH on Solaris.
This adds a configure option --enable-path-for-scripts=PREFIX
which defaults to empty except on Solaris it is /usr/xpg4/bin
to make POSIX grep and others available. The Solaris case had
been documented in INSTALL with a manual fix but it's better
to do this automatically since it is needed on most Solaris
systems anyway.

Thanks to Daniel Richard G.
2019-09-24 23:02:40 +03:00
Antoine Cœur 2fb0ddaa55 spelling 2019-05-11 20:52:37 +03:00
Lasse Collin f76f7516d6 xzless: Rename unused variables to silence static analysers.
In this particular case I don't see this affecting readability
of the code.

Thanks to Pavel Raiskup.
2018-07-27 18:10:44 +03:00
Lasse Collin eb61bc58c2 xzdiff: Make the mktemp usage compatible with FreeBSD's mktemp.
Thanks to Rui Paulo for the fix.
2015-02-09 22:08:37 +02:00
Lasse Collin 7b03a15cea xzdiff: Use mkdir if mktemp isn't available. 2014-11-10 18:54:40 +02:00
Lasse Collin f8c13e5e36 xzdiff: Create a temporary directory to hold a temporary file.
This avoids the possibility of "File name too long" when
creating a temp file when the input file name is very long.

This also means that other users on the system can no longer
see the input file names in /tmp (or whatever $TMPDIR is)
since the temporary directory will have a generic name. This
usually doesn't matter since on many systems one can see
the arguments given to all processes anyway.

The number X chars to mktemp where increased from 6 to 10.

Note that with some shells temp files or dirs won't be used at all.
2014-11-10 18:45:01 +02:00
Lasse Collin efa7b0a210 xzgrep: Avoid passing both -q and -l to grep.
The behavior of grep -ql varies:
  - GNU grep behaves like grep -q.
  - OpenBSD grep behaves like grep -l.

POSIX doesn't make it 100 % clear what behavior is expected.
Anyway, using both -q and -l at the same time makes no sense
so both options simply should never be used at the same time.

Thanks to Christian Weisgerber.
2014-10-09 18:42:14 +03:00
Lasse Collin ceca379017 xzgrep: exit 0 when at least one file matches.
Mimic the original grep behavior and return exit_success when
at least one xz compressed file matches given pattern.

Original bugreport:
https://bugzilla.redhat.com/show_bug.cgi?id=1108085

Thanks to Pavel Raiskup for the patch.
2014-06-11 20:43:28 +03:00
Lasse Collin a37ae8b5eb Man pages: Use similar syntax for synopsis as in xz.
The man pages of lzmainfo, xzmore, and xzdec had similar
constructs as the man page of xz had before the commit
eb6ca9854b. Eric S. Raymond
didn't mention these man pages in his bug report, but
it's nice to be consistent.
2013-06-30 18:02:27 +03:00
Jeff Bastian 5019413a05 xzgrep: make the '-h' option to be --no-filename equivalent
* src/scripts/xzgrep.in: Accept the '-h' option in argument parsing.
2013-04-05 19:14:50 +03:00
Lasse Collin 9e6dabcf22 Avoid unneeded use of awk in xzless.
Use "read" instead of "awk" in xzless to get the version
number of "less". The need for awk was introduced in
the commit db5c1817fa.

Thanks to Ariel P for the patch.
2013-03-05 19:14:50 +02:00
Jonathan Nieder db5c1817fa xzless: Make "less -V" parsing more robust
In v4.999.9beta~30 (xzless: Support compressed standard input,
2009-08-09), xzless learned to parse ‘less -V’ output to figure out
whether less is new enough to handle $LESSOPEN settings starting
with “|-”.  That worked well for a while, but the version string from
‘less’ versions 448 (June, 2012) is misparsed, producing a warning:

	$ xzless /tmp/test.xz; echo $?
	/usr/bin/xzless: line 49: test: 456 (GNU regular expressions): \
	integer expression expected
	0

More precisely, modern ‘less’ lists the regexp implementation along
with its version number, and xzless passes the entire version number
with attached parenthetical phrase as a number to "test $a -gt $b",
producing the above confusing message.

	$ less-444 -V | head -1
	less 444
	$ less -V | head -1
	less 456 (no regular expressions)

So relax the pattern matched --- instead of expecting "less <number>",
look for a line of the form "less <number>[ (extra parenthetical)]".
While at it, improve the behavior when no matching line is found ---
instead of producing a cryptic message, we can fall back on a LESSPIPE
setting that is supported by all versions of ‘less’.

The implementation uses "awk" for simplicity.  Hopefully that’s
portable enough.

Reported-by: Jörg-Volker Peetz <jvpeetz@web.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-11-21 19:19:44 +02:00
Lasse Collin cff070aba6 Fix exit status of xzgrep when grepping binary files.
When grepping binary files, grep may exit before it has
read all the input. In this case, gzip -q returns 2 (eating
SIGPIPE), but xz and bzip2 show SIGPIPE as the exit status
(e.g. 141). This causes wrong exit status when grepping
xz- or bzip2-compressed binary files.

The fix checks for the special exit status that indicates SIGPIPE.
It uses kill -l which should be supported everywhere since it
is in both SUSv2 (1997) and POSIX.1-2008.

Thanks to James Buren for the bug report.
2012-02-22 14:02:34 +02:00
Lasse Collin 1c673e5681 Fix exit status of "xzdiff foo.xz bar.xz".
xzdiff was clobbering the exit status from diff in a case
statement used to analyze the exit statuses from "xz" when
its operands were two compressed files. Save and restore
diff's exit status to fix this.

The bug is inherited from zdiff in GNU gzip and was fixed
there on 2009-10-09.

Thanks to Jonathan Nieder for the patch and
to Peter Pallinger for reporting the bug.
2011-07-31 11:01:47 +03:00
Martin Väth bd5002f582 xzgrep: fix typo in $0 parsing
Reported-by: Diego Elio Pettenò <flameeyes@gentoo.org>
Signed-off-by: Martin Väth <vaeth@mathematik.uni-wuerzburg.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-04-18 19:33:27 +03:00
Lasse Collin 40277998cb Scripts: Better fix for xzgrep.
Now it uses "grep -q".

Thanks to Gregory Margo.
2011-03-24 01:42:49 +02:00
Lasse Collin c7210d9a3f Scripts: Fix xzgrep -l.
It didn't work at all. It tried to use the -q option
for grep, but it appended it after "--". This works
around it by redirecting to /dev/null. The downside
is that this can be slower with big files compared
to proper use of "grep -q".

Thanks to Gregory Margo.
2011-03-24 01:21:32 +02:00
Lasse Collin 4eb83e3204 Scripts: Add lzop (.lzo) support to xzdiff and xzgrep. 2011-03-19 13:08:22 +02:00
Lasse Collin 316cbe2446 Scripts: Fix gzip and bzip2 support in xzdiff. 2010-12-13 16:36:33 +02:00
Lasse Collin b1c7368f95 Build: Add options to disable individual command line tools. 2010-10-08 15:25:45 +03:00
Lasse Collin cec0ddc8ec Major man page updates.
Lots of content was updated on the xz man page.

Technical improvements:
  - Start a new sentence on a new line.
  - Use fairly short lines.
  - Use constant-width font for examples (where supported).
  - Some minor cleanups.

Thanks to Jonathan Nieder for some language fixes.
2010-09-27 23:29:34 +03:00
Lasse Collin f4b2b52624 Fix xzgrep to not break if filenames have spaces or quotes.
Thanks to someone who reported the bug on IRC.
2010-03-07 19:52:25 +02:00
Lasse Collin eb7d51a3fa Collection of language fixes to comments and docs.
Thanks to Jonathan Nieder.
2010-02-12 13:16:15 +02:00
Jonathan Nieder 78e92c1847 Escape dashes in xzmore.1
A minus sign is larger, easier to see in a printout, and more
likely to use the same glyph as ASCII hyphen-minus in a terminal
than a hyphen.  Since broken manual pagers do not find hyphens
when the user searches for a hyphen-minus, minus signs are also
easier to search for.  So use minus signs instead of hyphens to
render sample terminal output.
2009-10-16 20:39:24 +03:00
Jonathan Nieder e71903fc61 “xzdiff a.xz b.xz” always fails
Attempts to compare two compressed files result in no output and
exit status 2.

Instead of going to standard output, ‘diff’ output is being
captured in the xz_status variable along with the exit status from
the decompression commands.  Later, when this variable is examined
for nonzero status codes, numerals from dates in the ‘diff’ output
make it appear as though decompression failed.

So let the ‘diff’ output leak to standard output with another file
descriptor.  (This trick is used in all similar contexts elsewhere
in xzdiff and in the analogous context in gzip’s zdiff script.)
2009-08-09 22:55:19 +03:00
Jonathan Nieder 1d314b81aa xzless: Support compressed standard input
It can be somewhat confusing that

	less < some_file.txt

works fine, whereas

	xzless < some_file.txt.xz

does not.  Since version 429, ‘less’ allows a filter specified in
the LESSOPEN environment variable to preprocess its input even if
it comes from standard input, if $LESSOPEN begins with ‘|-’.  So
set $LESSOPEN to take advantage of this feature.

Check less’s version at runtime so xzless can continue to work
with older versions.
2009-08-09 22:27:22 +03:00