xzgrep wouldn't exit on SIGPIPE or SIGQUIT when it clearly
should have. It's quite possible that it's not perfect still
but at least it's much better.
If multiple exit statuses compete, now it tries to pick
the largest of value.
Some comments were added.
The exit status handling of signals is still broken if the shell
uses values larger than 255 in $? to indicate that a process
died due to a signal ***and*** their "exit" command doesn't take
this into account. This seems to work well with the ksh and yash
versions I tried. However, there is a report in gzip/zgrep that
OpenSolaris 5.11 (not 5.10) has a problem with "exit" truncating
the argument to 8 bits:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22900#25
Such a bug would break xzgrep but I didn't add a workaround
at least for now. 5.11 is old and I don't know if the problem
exists in modern descendants, or if the problem exists in other
ksh implementations in use.
I don't know if this can make a difference in the real world
but it looked kind of suspicious (what happens with sed
implementations that cannot process very long lines?).
At least this commit shouldn't make it worse.
It avoids the use of sed for prefixing filenames to output lines.
Using sed for that is slower and prone to security bugs so now
the sed method is only used as a fallback.
This also fixes an actual bug: When grepping a binary file,
GNU grep nowadays prints its diagnostics to stderr instead of
stdout and thus the sed-method for prefixing the filename doesn't
work. So with this commit grepping binary files gives reasonable
output with GNU grep now.
This was inspired by zgrep but the implementation is different.
Also replace one use of expr with printf.
The rationale for LC_ALL=C was already mentioned in
69d1b3fc29 that fixed a security
issue. However, unrelated uses weren't changed in that commit yet.
POSIX says that with sed and such tools one should use LC_ALL=C
to ensure predictable behavior when strings contain byte sequences
that aren't valid multibyte characters in the current locale. See
under "Application usage" in here:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
With GNU sed invalid multibyte strings would work without this;
it's documented in its Texinfo manual. Some other implementations
aren't so forgiving.
Fix handling of "xzgrep -25 foo" (in GNU grep "grep -25 foo" is
an alias for "grep -C25 foo"). xzgrep would treat "foo" as filename
instead of as a pattern. This bug was fixed in zgrep in gzip in 2012.
Add -E, -F, -G, and -P to the "no argument required" list.
Add -X to "argument required" list. It is an
intentionally-undocumented GNU grep option so this isn't
an important option for xzgrep but it seems that other grep
implementations (well, those that I checked) don't support -X
so I hope this change is an improvement still.
grep -d (grep --directories=ACTION) requires an argument. In
contrast to zgrep, I kept -d in the "no argument required" list
because it's not supported in xzgrep (or zgrep). This way
"xzgrep -d" gives an error about option being unsupported instead
of telling that it requires an argument. Both zgrep and xzgrep
tell that it's unsupported if an argument is specified.
Add comments.
Turns out that this is needed for .lzma files as the spec in
LZMA SDK says that end marker may be present even if the size
is stored in the header. Such files are rare but exist in the
real world. The code in liblzma is so old that the spec didn't
exist in LZMA SDK back then and I had understood that such
files weren't possible (the lzma tool in LZMA SDK didn't
create such files).
This modifies the internal API so that LZMA decoder can be told
if EOPM is allowed even when the uncompressed size is known.
It's allowed with .lzma and not with other uses.
Thanks to Karl Beldan for reporting the problem.
The script uses lcov and genhtml after running the tests
to show the code coverage statistics. The script will create
a coverage directory where it is run. It can be run both in
and out of the source directory.
lzma_vli is unsigned so trying a signed value results in
a compiler warning from -Wsign-conversion. (lzma_vli)-1
equals to LZMA_VLI_UNKNOWN anyway which is the next assertion.
Created tests for all API functions exported in
src/liblzma/api/lzma/hardware.h. The tests are fairly trivial
but are helpful because they will inform users if their machines
cannot support these functions. They also improve the code
coverage metrics.
It has been the default for quite some time already and
the old serial harness isn't discouraged. The downside is
that with parallel tests one cannot print progress info or
other diagnostics to the terminal; all output from the tests
will be in the log files only. But now that the compression
tests are separated the parallel tests will speed things up.
test_compress.sh now takes one command line argument:
a filename to be tested. If it begins with "compress_generated_"
the file will be created with create_compress_files.
This will allow parallel execution of the slow tests.
If a command line argument is given, then only the test file
of that type is created. It's quite dumb in sense that unknown
names don't give an error but it's good enough here.
Also use EXIT_FAILURE instead of 1 as exit status for errors.
The SIZE_MAX / 3 was 1365 MiB. 1400 MiB gives little more room
and it looks like a round (artificial) number in --info-memory
once --info-memory is made to display it.
Also, using #if avoids useless code on 64-bit builds.
This is a soft limit in sense that it only affects the number of
threads. It never makes xz fail and it never makes xz change
settings that would affect the compressed output.
The idea is to make -T0 have more reasonable behavior when
the system has very many cores or when a memory-hungry
compression options are used. This also helps with 32-bit xz,
preventing it from running out of address space.
The downside of this commit is that now the number of threads
might become too low compared to what the user expected. I
hope this to be an acceptable compromise as the old behavior
has been a source of well-argued complaints for a long time.
The main problem withi the old behavior is that the compressed
output is different on single-core systems vs. multicore systems.
This commit fixes it by making -T0 one thread in multithreaded mode
on single-core systems.
The downside of this is that it uses more memory. However, if
--memlimit-compress is used, xz can (thanks to the previous commit)
drop to the single-threaded mode still.
In single-threaded mode, --memlimit-compress can make xz scale down
the LZMA2 dictionary size to meet the memory usage limit. This
obviously affects the compressed output. However, if xz was in
threaded mode, --memlimit-compress could make xz reduce the number
of threads but it wouldn't make xz switch from multithreaded mode
to single-threaded mode or scale down the LZMA2 dictionary size.
This seemed illogical and there was even a "FIXME?" about it.
Now --memlimit-compress can make xz switch to single-threaded
mode if one thread in multithreaded mode uses too much memory.
If memory usage is still too high, then the LZMA2 dictionary
size can be scaled down too.
The option --no-adjust was also changed so that it no longer
prevents xz from scaling down the number of threads as that
doesn't affect compressed output (only performance). After
this commit --no-adjust only prevents adjustments that affect
compressed output, that is, with --no-adjust xz won't switch
from multithreaded mode to single-threaded mode and won't
scale down the LZMA2 dictionary size.
The man page wasn't updated yet.
--memlimit-mt-decompress allows specifying the limit for
multithreaded decompression. This matches memlimit_threading in
liblzma. This limit can only affect the number of threads being
used; it will never prevent xz from decompressing a file. The
old --memlimit-decompress option is still used at the same time.
If the value of --memlimit-decompress (the default value or
one specified by the user) is less than the value of
--memlimit-mt-decompress , then --memlimit-mt-decompress is
reduced to match --memlimit-decompress.
Man page wasn't updated yet.