Add lzma_outq_clear_cache2() which may leave one buffer allocated
in the cache.
Add lzma_outq_outbuf_memusage() to get the memory needed for
a single lzma_outbuf. This is now used internally in outqueue.c too.
Track both the total amount of memory allocated and the amount of
memory that is in active use (not in cache).
In lzma_outbuf, allow storing the current input position that
matches the current output position. This way the main thread
can notice when no more output is possible without first providing
more input.
Allow specifying return code for lzma_outq_read() in a finished
lzma_outbuf.
If lzma_index_append() failed (most likely memory allocation failure)
it could have gone unnoticed and the resulting .xz file would have
an incorrect Index. Decompressing such a file would produce the
correct uncompressed data but then an error would occur when
verifying the Index field.
Now it limits the input and output buffer sizes that are
passed to a raw decoder. This way there's no need to check
if the sizes can grow too big or overflow when updating
Compressed Size and Uncompressed Size counts. This also means
that a corrupt file cannot cause the raw decoder to process
useless extra input or output that would exceed the size info
in Block Header (and thus cause LZMA_DATA_ERROR anyway).
More importantly, now the size information is verified more
carefully in case raw decoder returns LZMA_OK. This doesn't
really matter with the current single-threaded .xz decoder
as the errors would be detected slightly later anyway. But
this helps avoiding corner cases in the upcoming threaded
decompressor, and it might help other Block decoder uses
outside liblzma too.
The test files bad-1-lzma2-{9,10,11}.xz test these conditions.
With the single-threaded .xz decoder the only difference is
that LZMA_DATA_ERROR is detected in a difference place now.
This matches xz-utils 5.2.5-2 in Debian.
The translation was done by "bubu", proofread by the debian-l10n-french
mailing list contributors, and submitted to me on the xz-devel mailing
list by Jean-Pierre Giraud. Thanks to everyone!
Previously lzma_lzma_props_encode() and lzma_lzma2_props_encode()
assumed that the options pointers must be non-NULL because the
with these filters the API says it must never be NULL. It is
good to do these checks anyway.
This broke 32-bit builds due to a pointer type mismatch.
This bug was introduced with the output-size-limited encoding
in 625f4c7c99.
Thanks to huangqinjin for the bug report.
OpenBSD does not allow to change the group of a file if the user
does not belong to this group. In contrast to Linux, OpenBSD also
fails if the new group is the same as the old one. Do not call
fchown(2) in this case, it would change nothing anyway.
This fixes an issue with Perl Alien::Build module.
https://github.com/PerlAlien/Alien-Build/issues/62
Sometimes the version number from "less -V" contains a dot,
sometimes not. xzless failed detect the version number when
it does contain a dot. This fixes it.
Thanks to nick87720z for reporting this. Apparently it had been
reported here <https://bugs.gentoo.org/489362> in 2013.
Due to architectural limitations, address space available to a single
userspace process on MIPS32 is limited to 2 GiB, not 4, even on systems
that have more physical RAM -- e.g. 64-bit systems with 32-bit
userspace, or systems that use XPA (an extension similar to x86's PAE).
So, for MIPS32, we have to impose stronger memory limits. I've chosen
2000MiB to give the process some headroom.
The naming conflict with FindLibLZMA module gets worse.
Not avoiding it in the first place was stupid.
Normally find_package(LibLZMA) will use the module and
find_package(liblzma 5.2.5 REQUIRED CONFIG) will use the config
file even with a case insensitive file system. However, if
CMAKE_FIND_PACKAGE_PREFER_CONFIG is TRUE and the file system
is case insensitive, find_package(LibLZMA) will find our liblzma
config file instead of using FindLibLZMA module.
One big problem with this is that FindLibLZMA uses
LibLZMA::LibLZMA and we use liblzma::liblzma as the target
name. With target names CMake happens to be case sensitive.
To workaround this, this commit adds
add_library(LibLZMA::LibLZMA ALIAS liblzma::liblzma)
to the config file. Then both spellings work.
To make the behavior consistent between case sensitive and
insensitive file systems, the config and related files are
renamed from liblzmaConfig.cmake to liblzma-config.cmake style.
With this style CMake looks for lowercase version of the package
name so find_package(LiBLzmA 5.2.5 REQUIRED CONFIG) will work
to find our config file.
There are other differences between our config file and
FindLibLZMA so it's still possible that things break for
reasons other than the spelling of the target name. Hopefully
those situations aren't too common.
When the config file is available, it should always give as good or
better results as FindLibLZMA so this commit doesn't affect the
recommendation to use find_package(liblzma 5.2.5 REQUIRED CONFIG)
which explicitly avoids FindLibLZMA.
Thanks to Markus Rickert.
When the uncompressed size is known to be exact, after decompressing
the stream exactly comp_size bytes of input must have been consumed.
This is a minor improvement to error detection.
The caller must still not specify an uncompressed size bigger
than the actual uncompressed size.
As a downside, this now needs the exact compressed size.
Right now this is just a planned extra-compact format for use
in the EROFS file system in Linux. At this point it's possible
that the format will either change or be abandoned and removed
completely.
The special thing about the encoder is that it uses the
output-size-limited encoding added in the previous commit.
EROFS uses fixed-sized blocks (e.g. 4 KiB) to hold compressed
data so the compressors must be able to create valid streams
that fill the given block size.
With this it is possible to encode LZMA1 data without EOPM so that
the encoder will encode as much input as it can without exceeding
the specified output size limit. The resulting LZMA1 stream will
be a normal LZMA1 stream without EOPM. The actual uncompressed size
will be available to the caller via the uncomp_size pointer.
One missing thing is that the LZMA layer doesn't inform the LZ layer
when the encoding is finished and thus the LZ may read more input
when it won't be used. However, this doesn't matter if encoding is
done with a single call (which is the planned use case for now).
For proper multi-call encoding this should be improved.
This commit only adds the functionality for internal use.
Nothing uses it yet.
Previously this required using --force but that has other
effects too which might be undesirable. Changing the behavior
of --keep has a small risk of breaking existing scripts but
since this is a fairly special corner case I expect the
likehood of breakage to be low enough.
I think the new behavior is more logical. The only reason for
the old behavior was to be consistent with gzip and bzip2.
Thanks to Vincent Lefevre and Sebastian Andrzej Siewior.
Omit the -q option from xz, gzip, and bzip2. With xz this shouldn't
matter. With gzip it's important because -q makes gzip replace SIGPIPE
with exit status 2. With bzip2 it's important because with -q bzip2
is completely silent if input is corrupt while other decompressors
still give an error message.
Avoiding exit status 2 from gzip is important because bzip2 uses
exit status 2 to indicate corrupt input. Before this commit xzgrep
didn't recognize corrupt .bz2 files because xzgrep was treating
exit status 2 as SIGPIPE for gzip compatibility.
zstd still needs -q because otherwise it is noisy in normal
operation.
The code to detect real SIGPIPE didn't check if the exit status
was due to a signal (>= 128) and so could ignore some other exit
status too.
This is a minor fix since this affects only the situation when
the files differ and the exit status is something else than 0.
In such case there could be SIGPIPE from a decompression tool
and that would result in exit status of 2 from xzdiff/xzcmp
while the correct behavior would be to return 1 or whatever
else diff or cmp may have returned.
This commit omits the -q option from xz/gzip/bzip2/lzop arguments.
I'm not sure why the -q was used in the first place, perhaps it
hides warnings in some situation that I cannot see at the moment.
Hopefully the removal won't introduce a new bug.
With gzip the -q option was harmful because it made gzip return 2
instead of >= 128 with SIGPIPE. Ignoring exit status 2 (warning
from gzip) isn't practical because bzip2 uses exit status 2 to
indicate corrupt input file. It's better if SIGPIPE results in
exit status >= 128.
With bzip2 the removal of -q seems to be good because with -q
it prints nothing if input is corrupt. The other tools aren't
silent in this situation even with -q. On the other hand, if
zstd support is added, it will need -q since otherwise it's
noisy in normal situations.
Thanks to Étienne Mollier and Sebastian Andrzej Siewior.
Before this commit all output queue buffers were allocated as
a single big allocation. Now each buffer is allocated separately
when needed. Used buffers are cached to avoid reallocation
overhead but the cache will keep only one buffer size at a time.
This should make things work OK in the decompression where most
of the time the buffer sizes will be the same but with some less
common files the buffer sizes may vary.
While this should work fine, it's still a bit preliminary
and may even get reverted if it turns out to be useless for
decompression.