Commit Graph

1115 Commits

Author SHA1 Message Date
Lasse Collin ecb966de30 liblzma: Add experimental ARM64 BCJ filter with a temporary Filter ID.
That is, the Filter ID will be changed once the design is final.
The current version will be removed. So files created with the
tempoary Filter ID won't be supported in the future.
2022-09-19 20:23:46 +03:00
Lasse Collin 177bdc922c liblzma: Simple/BCJ filters: Allow disabling generic BCJ options.
This will be needed for the ARM64 BCJ filter as it will use
its own options struct.
2022-09-17 22:42:18 +03:00
Lasse Collin 097c7b67ce xzgrep: Fix compatibility with old shells.
Running the current xzgrep on Slackware 10.1 with GNU bash 3.00.15:

    xzgrep: line 231: syntax error near unexpected token `;;'

On SCO OpenServer 5.0.7 with Korn Shell 93r:

    syntax error at line 231 : `;;' unexpected

Turns out that some old shells don't like apostrophes (') inside
command substitutions. For example, the following fails:

    x=$(echo foo
    # asdf'zxcv
    echo bar)
    printf '%s\n' "$x"

The problem was introduced by commits
69d1b3fc29 (2022-03-29),
bd7b290f3f (2022-07-18), and
a648978b20 (2022-07-19).
5.2.6 is the only stable release that included
this problem.

Thanks to Kevin R. Bulgrien for reporting the problem
on SCO OpenServer 5.0.7 and for providing the fix.
2022-09-16 14:07:03 +03:00
Lasse Collin f8ee61e74e liblzma: lzma_filters_copy: Keep dest[] unmodified if an error occurs.
lzma_stream_encoder() and lzma_stream_encoder_mt() always assumed
this. Before this patch, failing lzma_filters_copy() could result
in free(invalid_pointer) or invalid memory reads in stream_encoder.c
or stream_encoder_mt.c.

To trigger this, allocating memory for a filter options structure
has to fail. These are tiny allocations so in practice they very
rarely fail.

Certain badness in the filter chain array could also make
lzma_filters_copy() fail but both stream_encoder.c and
stream_encoder_mt.c validate the filter chain before
trying to copy it, so the crash cannot occur this way.
2022-09-09 13:51:57 +03:00
Jia Tan 18d7facd38 liblzma: lzma_index_append: Add missing integer overflow check.
The documentation in src/liblzma/api/lzma/index.h suggests that
both the unpadded (compressed) size and the uncompressed size
are checked for overflow, but only the unpadded size was checked.
The uncompressed check is done first since that is more likely to
occur than the unpadded or index field size overflows.
2022-09-08 15:19:19 +03:00
Lasse Collin 913ddc5572 liblzma: Vaccinate against an ill patch from RHEL/CentOS 7.
RHEL/CentOS 7 shipped with 5.1.2alpha, including the threaded
encoder that is behind #ifdef LZMA_UNSTABLE in the API headers.
In 5.1.2alpha these symbols are under XZ_5.1.2alpha in liblzma.map.
API/ABI compatibility tracking isn't done between development
releases so newer releases didn't have XZ_5.1.2alpha anymore.

Later RHEL/CentOS 7 updated xz to 5.2.2 but they wanted to keep
the exported symbols compatible with 5.1.2alpha. After checking
the ABI changes it turned out that >= 5.2.0 ABI is backward
compatible with the threaded encoder functions from 5.1.2alpha
(but not vice versa as fixes and extensions to these functions
were made between 5.1.2alpha and 5.2.0).

In RHEL/CentOS 7, XZ Utils 5.2.2 was patched with
xz-5.2.2-compat-libs.patch to modify liblzma.map:

  - XZ_5.1.2alpha was added with lzma_stream_encoder_mt and
    lzma_stream_encoder_mt_memusage. This matched XZ Utils 5.1.2alpha.

  - XZ_5.2 was replaced with XZ_5.2.2. It is clear that this was
    an error; the intention was to keep using XZ_5.2 (XZ_5.2.2
    has never been used in XZ Utils). So XZ_5.2.2 lists all
    symbols that were listed under XZ_5.2 before the patch.
    lzma_stream_encoder_mt and _mt_memusage are included too so
    they are listed both here and under XZ_5.1.2alpha.

The patch didn't add any __asm__(".symver ...") lines to the .c
files. Thus the resulting liblzma.so exports the threaded encoder
functions under XZ_5.1.2alpha only. Listing the two functions
also under XZ_5.2.2 in liblzma.map has no effect without
matching .symver lines.

The lack of XZ_5.2 in RHEL/CentOS 7 means that binaries linked
against unpatched XZ Utils 5.2.x won't run on RHEL/CentOS 7.
This is unfortunate but this alone isn't too bad as the problem
is contained within RHEL/CentOS 7 and doesn't affect users
of other distributions. It could also be fixed internally in
RHEL/CentOS 7.

The second problem is more serious: In XZ Utils 5.2.2 the API
headers don't have #ifdef LZMA_UNSTABLE for obvious reasons.
This is true in RHEL/CentOS 7 version too. Thus now programs
using new APIs can be compiled without an extra #define. However,
the programs end up depending on symbol version XZ_5.1.2alpha
(and possibly also XZ_5.2.2) instead of XZ_5.2 as they would
with an unpatched XZ Utils 5.2.2. This means that such binaries
won't run on other distributions shipping XZ Utils >= 5.2.0 as
they don't provide XZ_5.1.2alpha or XZ_5.2.2; they only provide
XZ_5.2 (and XZ_5.0). (This includes RHEL/CentOS 8 as the patch
luckily isn't included there anymore with XZ Utils 5.2.4.)

Binaries built by RHEL/CentOS 7 users get distributed and then
people wonder why they don't run on some other distribution.
Seems that people have found out about the patch and been copying
it to some build scripts, seemingly curing the symptoms but
actually spreading the illness further and outside RHEL/CentOS 7.

The ill patch seems to be from late 2016 (RHEL 7.3) and in 2017 it
had spread at least to EasyBuild. I heard about the events only
recently. :-(

This commit splits liblzma.map into two versions: one for
GNU/Linux and another for other OSes that can use symbol versioning
(FreeBSD, Solaris, maybe others). The Linux-specific file and the
matching additions to .c files add full compatibility with binaries
that have been built against a RHEL/CentOS-patched liblzma. Builds
for OSes other than GNU/Linux won't get the vaccine as they should
be immune to the problem (I really hope that no build script uses
the RHEL/CentOS 7 patch outside GNU/Linux).

The RHEL/CentOS compatibility symbols XZ_5.1.2alpha and XZ_5.2.2
are intentionally put *after* XZ_5.2 in liblzma_linux.map. This way
if one forgets to #define HAVE_SYMBOL_VERSIONS_LINUX when building,
the resulting liblzma.so.5 will have lzma_stream_encoder_mt@@XZ_5.2
since XZ_5.2 {...} is the first one that lists that function.
Without HAVE_SYMBOL_VERSIONS_LINUX @XZ_5.1.2alpha and @XZ_5.2.2
will be missing but that's still a minor problem compared to
only having lzma_stream_encoder_mt@@XZ_5.1.2alpha!

The "local: *;" line was moved to XZ_5.0 so that it doesn't need
to be moved around. It doesn't matter where it is put.

Having two similar liblzma_*.map files is a bit silly as it is,
at least for now, easily possible to generate the generic one
from the Linux-specific file. But that adds extra steps and
increases the risk of mistakes when supporting more than one
build system. So I rather maintain two files in parallel and let
validate_map.sh check that they are in sync when "make mydist"
is run.

This adds .symver lines for lzma_stream_encoder_mt@XZ_5.2.2 and
lzma_stream_encoder_mt_memusage@XZ_5.2.2 even though these
weren't exported by RHEL/CentOS 7 (only @@XZ_5.1.2alpha was
for these two). I added these anyway because someone might
misunderstand the RHEL/CentOS 7 patch and think that @XZ_5.2.2
(@@XZ_5.2.2) versions were exported too.

At glance one could suggest using __typeof__ to copy the function
prototypes when making aliases. However, this doesn't work trivially
because __typeof__ won't copy attributes (lzma_nothrow, lzma_pure)
and it won't change symbol visibility from hidden to default (done
by LZMA_API()). Attributes could be copied with __copy__ attribute
but that needs GCC 9 and a fallback method would be needed anyway.

This uses __symver__ attribute with GCC >= 10 and
__asm__(".symver ...") with everything else. The attribute method
is required for LTO (-flto) support with GCC. Using -flto with
GCC older than 10 is now broken on GNU/Linux and will not be fixed
(can silently result in a broken liblzma build that has dangerously
incorrect symbol versions). LTO builds with Clang seem to work
with the traditional __asm__(".symver ...") method.

Thanks to Boud Roukema for reporting the problem and discussing
the details and testing the fix.
2022-09-08 15:01:29 +03:00
Lasse Collin c1555b1a22 Bump version number for 5.3.3alpha. 2022-08-22 18:16:40 +03:00
Lasse Collin 311e4f85ed xz: Try to clarify --memlimit-mt-decompress vs. --memlimit-compress. 2022-08-22 18:01:21 +03:00
Lasse Collin 02a777f9c4 xz: Revise --info-memory output.
The strings could be more descriptive but it's good
to have some version of this committed now.

--robot mode wasn't changed yet.
2022-08-19 23:40:00 +03:00
Lasse Collin f864f6d42e xz: Update the man page for threaded decompression and memlimits.
This documents the changes made in commits
6c6da57ae2,
cad299008c, and
898faa9728.

The --info-memory bit hasn't been finished yet
even though it's already mentioned in this commit
under --memlimit-mt-decompress and --threads.
2022-08-19 23:15:56 +03:00
Lasse Collin c4e8e5fb31 liblzma: Threaded decoder: Improve LZMA_FAIL_FAST when LZMA_FINISH is used.
It will now return LZMA_DATA_ERROR (not LZMA_OK or LZMA_BUF_ERROR)
if LZMA_FINISH is used and there isn't enough input to finish
decoding the Block Header or the Block. The use of LZMA_DATA_ERROR
is simpler and the less risky than LZMA_BUF_ERROR but this might
be changed before 5.4.0.
2022-08-18 17:16:49 +03:00
Jia Tan 61f8ec804a liblzma: Refactor lzma_mf_is_supported() to use a switch-statement. 2022-07-25 18:30:10 +03:00
Lasse Collin 9cc721af54 xz: Update the man page that change to --keep will be in 5.2.6. 2022-07-24 13:27:48 +03:00
Lasse Collin d796b6d7fd xzgrep man page: Document exit statuses. 2022-07-19 23:19:49 +03:00
Lasse Collin 923bf96b55 xzgrep: Improve error handling, especially signals.
xzgrep wouldn't exit on SIGPIPE or SIGQUIT when it clearly
should have. It's quite possible that it's not perfect still
but at least it's much better.

If multiple exit statuses compete, now it tries to pick
the largest of value.

Some comments were added.

The exit status handling of signals is still broken if the shell
uses values larger than 255 in $? to indicate that a process
died due to a signal ***and*** their "exit" command doesn't take
this into account. This seems to work well with the ksh and yash
versions I tried. However, there is a report in gzip/zgrep that
OpenSolaris 5.11 (not 5.10) has a problem with "exit" truncating
the argument to 8 bits:

    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22900#25

Such a bug would break xzgrep but I didn't add a workaround
at least for now. 5.11 is old and I don't know if the problem
exists in modern descendants, or if the problem exists in other
ksh implementations in use.
2022-07-19 23:13:24 +03:00
Lasse Collin a648978b20 xzgrep: Make the fix for ZDI-CAN-16587 more robust.
I don't know if this can make a difference in the real world
but it looked kind of suspicious (what happens with sed
implementations that cannot process very long lines?).
At least this commit shouldn't make it worse.
2022-07-19 00:10:55 +03:00
Lasse Collin bd7b290f3f xzgrep: Use grep -H --label when available (GNU, *BSDs).
It avoids the use of sed for prefixing filenames to output lines.
Using sed for that is slower and prone to security bugs so now
the sed method is only used as a fallback.

This also fixes an actual bug: When grepping a binary file,
GNU grep nowadays prints its diagnostics to stderr instead of
stdout and thus the sed-method for prefixing the filename doesn't
work. So with this commit grepping binary files gives reasonable
output with GNU grep now.

This was inspired by zgrep but the implementation is different.
2022-07-18 22:06:10 +03:00
Lasse Collin b56729af9f xzgrep: Use -e to specify the pattern to grep.
Now we don't need the separate test for adding the -q option
as it can be added directly in the two places where it's needed.
2022-07-18 21:10:25 +03:00
Lasse Collin bad61b5997 Scripts: Use printf instead of echo in a few places.
It's a good habbit as echo has some portability corner cases
when the string contents can be anything.
2022-07-18 19:18:48 +03:00
Lasse Collin 6a4a4a7d26 xzgrep: Add more LC_ALL=C to avoid bugs with multibyte characters.
Also replace one use of expr with printf.

The rationale for LC_ALL=C was already mentioned in
69d1b3fc29 that fixed a security
issue. However, unrelated uses weren't changed in that commit yet.

POSIX says that with sed and such tools one should use LC_ALL=C
to ensure predictable behavior when strings contain byte sequences
that aren't valid multibyte characters in the current locale. See
under "Application usage" in here:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html

With GNU sed invalid multibyte strings would work without this;
it's documented in its Texinfo manual. Some other implementations
aren't so forgiving.
2022-07-17 21:36:25 +03:00
Lasse Collin b48f9d615f xzgrep: Fix parsing of certain options.
Fix handling of "xzgrep -25 foo" (in GNU grep "grep -25 foo" is
an alias for "grep -C25 foo"). xzgrep would treat "foo" as filename
instead of as a pattern. This bug was fixed in zgrep in gzip in 2012.

Add -E, -F, -G, and -P to the "no argument required" list.

Add -X to "argument required" list. It is an
intentionally-undocumented GNU grep option so this isn't
an important option for xzgrep but it seems that other grep
implementations (well, those that I checked) don't support -X
so I hope this change is an improvement still.

grep -d (grep --directories=ACTION) requires an argument. In
contrast to zgrep, I kept -d in the "no argument required" list
because it's not supported in xzgrep (or zgrep). This way
"xzgrep -d" gives an error about option being unsupported instead
of telling that it requires an argument. Both zgrep and xzgrep
tell that it's unsupported if an argument is specified.

Add comments.
2022-07-17 20:57:06 +03:00
Lasse Collin 107c93ee5c liblzma: Rename a variable and improve a comment. 2022-07-14 18:12:38 +03:00
Lasse Collin 9595a3119b liblzma: Add optional autodetection of LZMA end marker.
Turns out that this is needed for .lzma files as the spec in
LZMA SDK says that end marker may be present even if the size
is stored in the header. Such files are rare but exist in the
real world. The code in liblzma is so old that the spec didn't
exist in LZMA SDK back then and I had understood that such
files weren't possible (the lzma tool in LZMA SDK didn't
create such files).

This modifies the internal API so that LZMA decoder can be told
if EOPM is allowed even when the uncompressed size is known.
It's allowed with .lzma and not with other uses.

Thanks to Karl Beldan for reporting the problem.
2022-07-13 22:24:07 +03:00
Lasse Collin 0c0f8e9761 xz: Document the special memlimit case of 2000 MiB on MIPS32.
See commit fc3d3a7296.
2022-07-12 18:53:04 +03:00
Lasse Collin 2ce4f36f17 liblzma: Silence a warning.
The actual initialization is done via mythread_sync and seems
that GCC doesn't necessarily see that it gets initialized there.
2022-05-23 19:37:18 +03:00
Lasse Collin 5d8f3764ef xz: Fix build with --disable-threads. 2022-04-14 20:53:16 +03:00
Lasse Collin 1d59289727 xz: Change the cap of the default -T0 memlimit for 32-bit xz.
The SIZE_MAX / 3 was 1365 MiB. 1400 MiB gives little more room
and it looks like a round (artificial) number in --info-memory
once --info-memory is made to display it.

Also, using #if avoids useless code on 64-bit builds.
2022-04-14 14:50:17 +03:00
Lasse Collin c77fe55ddb xz: Add a default soft memory usage limit for --threads=0.
This is a soft limit in sense that it only affects the number of
threads. It never makes xz fail and it never makes xz change
settings that would affect the compressed output.

The idea is to make -T0 have more reasonable behavior when
the system has very many cores or when a memory-hungry
compression options are used. This also helps with 32-bit xz,
preventing it from running out of address space.

The downside of this commit is that now the number of threads
might become too low compared to what the user expected. I
hope this to be an acceptable compromise as the old behavior
has been a source of well-argued complaints for a long time.
2022-04-14 14:20:46 +03:00
Lasse Collin 0adc13bfe3 xz: Make -T0 use multithreaded mode on single-core systems.
The main problem withi the old behavior is that the compressed
output is different on single-core systems vs. multicore systems.
This commit fixes it by making -T0 one thread in multithreaded mode
on single-core systems.

The downside of this is that it uses more memory. However, if
--memlimit-compress is used, xz can (thanks to the previous commit)
drop to the single-threaded mode still.
2022-04-14 13:00:40 +03:00
Lasse Collin 898faa9728 xz: Changes to --memlimit-compress and --no-adjust.
In single-threaded mode, --memlimit-compress can make xz scale down
the LZMA2 dictionary size to meet the memory usage limit. This
obviously affects the compressed output. However, if xz was in
threaded mode, --memlimit-compress could make xz reduce the number
of threads but it wouldn't make xz switch from multithreaded mode
to single-threaded mode or scale down the LZMA2 dictionary size.
This seemed illogical and there was even a "FIXME?" about it.

Now --memlimit-compress can make xz switch to single-threaded
mode if one thread in multithreaded mode uses too much memory.
If memory usage is still too high, then the LZMA2 dictionary
size can be scaled down too.

The option --no-adjust was also changed so that it no longer
prevents xz from scaling down the number of threads as that
doesn't affect compressed output (only performance). After
this commit --no-adjust only prevents adjustments that affect
compressed output, that is, with --no-adjust xz won't switch
from multithreaded mode to single-threaded mode and won't
scale down the LZMA2 dictionary size.

The man page wasn't updated yet.
2022-04-14 12:38:00 +03:00
Lasse Collin cad299008c xz: Add --memlimit-mt-decompress along with a default limit value.
--memlimit-mt-decompress allows specifying the limit for
multithreaded decompression. This matches memlimit_threading in
liblzma. This limit can only affect the number of threads being
used; it will never prevent xz from decompressing a file. The
old --memlimit-decompress option is still used at the same time.

If the value of --memlimit-decompress (the default value or
one specified by the user) is less than the value of
--memlimit-mt-decompress , then --memlimit-mt-decompress is
reduced to match --memlimit-decompress.

Man page wasn't updated yet.
2022-04-12 00:04:30 +03:00
Lasse Collin fe87b4cd53 liblzma: Threaded decoder: Improve setting of pending_error.
It doesn't need to be done conditionally. The comments try
to explain it.
2022-04-06 23:11:59 +03:00
Lasse Collin 90621da7f6 liblzma: Add a new flag LZMA_FAIL_FAST for threaded decoder.
In most cases if the input file is corrupt the application won't
care about the uncompressed content at all. With this new flag
the threaded decoder will return an error as soon as any thread
has detected an error; it won't wait to copy out the data before
the location of the error.

I don't plan to use this in xz to keep the behavior consistent
between single-threaded and multi-threaded modes.
2022-04-06 13:16:00 +03:00
Lasse Collin 64b6d496dc liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.
This makes the behavior consistent with the single-threaded
decoder when handling truncated .xz files.

Thanks to Jia Tan for finding this issue.
2022-04-05 12:24:57 +03:00
Lasse Collin e671bc8828 liblzma: Threaded decoder: Support zpipe.c-style decoding loop.
This makes it possible to call lzma_code() in a loop that only
reads new input when lzma_code() didn't fill the output buffer
completely. That isn't the calling style suggested by the
liblzma example program 02_decompress.c so perhaps the usefulness
of this feature is limited.

Also, it is possible to write such a loop so that it works
with the single-threaded decoder but not with the threaded
decoder even after this commit, or so that it works only if
lzma_mt.timeout = 0.

The zlib tutorial <https://zlib.net/zlib_how.html> is a well-known
example of a loop where more input is read only when output isn't
full. Porting this as is to liblzma would work with the
single-threaded decoder (if LZMA_CONCATENATED isn't used) but it
wouldn't work with threaded decoder even after this commit because
the loop assumes that no more output is possible when it cannot
read more input ("if (strm.avail_in == 0) break;"). This cannot
be fixed at liblzma side; the loop has to be modified at least
a little.

I'm adding this in any case because the actual code is simple
and short and should have no harmful side-effects in other
situations.
2022-04-02 21:49:59 +03:00
Lasse Collin 69d1b3fc29 xzgrep: Fix escaping of malicious filenames (ZDI-CAN-16587).
Malicious filenames can make xzgrep to write to arbitrary files
or (with a GNU sed extension) lead to arbitrary code execution.

xzgrep from XZ Utils versions up to and including 5.2.5 are
affected. 5.3.1alpha and 5.3.2alpha are affected as well.
This patch works for all of them.

This bug was inherited from gzip's zgrep. gzip 1.12 includes
a fix for zgrep.

The issue with the old sed script is that with multiple newlines,
the N-command will read the second line of input, then the
s-commands will be skipped because it's not the end of the
file yet, then a new sed cycle starts and the pattern space
is printed and emptied. So only the last line or two get escaped.

One way to fix this would be to read all lines into the pattern
space first. However, the included fix is even simpler: All lines
except the last line get a backslash appended at the end. To ensure
that shell command substitution doesn't eat a possible trailing
newline, a colon is appended to the filename before escaping.
The colon is later used to separate the filename from the grep
output so it is fine to add it here instead of a few lines later.

The old code also wasn't POSIX compliant as it used \n in the
replacement section of the s-command. Using \<newline> is the
POSIX compatible method.

LC_ALL=C was added to the two critical sed commands. POSIX sed
manual recommends it when using sed to manipulate pathnames
because in other locales invalid multibyte sequences might
cause issues with some sed implementations. In case of GNU sed,
these particular sed scripts wouldn't have such problems but some
other scripts could have, see:

    info '(sed)Locale Considerations'

This vulnerability was discovered by:
cleemy desu wayo working with Trend Micro Zero Day Initiative

Thanks to Jim Meyering and Paul Eggert discussing the different
ways to fix this and for coordinating the patch release schedule
with gzip.
2022-03-29 20:10:50 +03:00
Lasse Collin bd93b776c1 liblzma: Fix a deadlock in threaded decoder.
If a worker thread has consumed all input so far and it's
waiting on thr->cond and then the main thread enables
partial update for that thread, the code used to deadlock.
This commit allows one dummy decoding pass to occur in this
situation which then also does the partial update.

As part of the fix, this moves thr->progress_* updates to
avoid the second thr->mutex locking.

Thanks to Jia Tan for finding, debugging, and reporting the bug.
2022-03-26 01:15:32 +02:00
Lasse Collin 487c77d487 liblzma: Threaded decoder: Don't stop threads on LZMA_TIMED_OUT.
LZMA_TIMED_OUT is not an error and thus stopping threads on
LZMA_TIMED_OUT breaks the decoder badly.

Thanks to Jia Tan for finding the bug and for the patch.
2022-03-23 16:28:55 +02:00
Lasse Collin 6c6da57ae2 xz: Add initial support for threaded decompression.
If threading support is enabled at build time, this will
use lzma_stream_decoder_mt() even for single-threaded mode.
With memlimit_threading=0 the behavior should be identical.

This needs some work like adding --memlimit-threading=LIMIT.

The original patch from Sebastian Andrzej Siewior included
a method to get currently available RAM on Linux. It might
be one way to go but as it is Linux-only, the available-RAM
approach needs work for portability or using a fallback method
on other OSes.

The man page wasn't updated yet.
2022-03-07 00:36:16 +02:00
Lasse Collin 4cce3e27f5 liblzma: Add threaded .xz decompressor.
I realize that this is about a decade late.

Big thanks to Sebastian Andrzej Siewior for the original patch.
I made a bunch of smaller changes but after a while quite a few
things got rewritten. So any bugs in the commit were created by me.
2022-03-07 00:35:53 +02:00
Lasse Collin 717631b978 liblzma: Fix docs: lzma_block_decoder() cannot return LZMA_UNSUPPORTED_CHECK.
If Check is unsupported, it will be silently ignored.
It's the caller's job to handle it.
2022-03-06 16:54:23 +02:00
Lasse Collin 1a4bb97a00 liblzma: Add new output queue (lzma_outq) features.
Add lzma_outq_clear_cache2() which may leave one buffer allocated
in the cache.

Add lzma_outq_outbuf_memusage() to get the memory needed for
a single lzma_outbuf. This is now used internally in outqueue.c too.

Track both the total amount of memory allocated and the amount of
memory that is in active use (not in cache).

In lzma_outbuf, allow storing the current input position that
matches the current output position. This way the main thread
can notice when no more output is possible without first providing
more input.

Allow specifying return code for lzma_outq_read() in a finished
lzma_outbuf.
2022-03-06 16:41:19 +02:00
Lasse Collin ddbc6f58c2 liblzma: Index hash: Change return value type of hash_append() to void. 2022-03-06 15:18:58 +02:00
Lasse Collin 20e7a33e2d liblzma: Minor addition to lzma_vli_size() API doc.
Thanks to Jia Tan.
2022-02-22 03:42:57 +02:00
Lasse Collin 4f78f5fcf6 liblzma: Check the return value of lzma_index_append() in threaded encoder.
If lzma_index_append() failed (most likely memory allocation failure)
it could have gone unnoticed and the resulting .xz file would have
an incorrect Index. Decompressing such a file would produce the
correct uncompressed data but then an error would occur when
verifying the Index field.
2022-02-22 02:04:18 +02:00
Ed Maste 865e0a3689 liblzma: Use non-executable stack on FreeBSD as on Linux 2022-02-22 01:23:34 +02:00
Lasse Collin 1c9a5786d2 liblzma: Make Block decoder catch certain types of errors better.
Now it limits the input and output buffer sizes that are
passed to a raw decoder. This way there's no need to check
if the sizes can grow too big or overflow when updating
Compressed Size and Uncompressed Size counts. This also means
that a corrupt file cannot cause the raw decoder to process
useless extra input or output that would exceed the size info
in Block Header (and thus cause LZMA_DATA_ERROR anyway).

More importantly, now the size information is verified more
carefully in case raw decoder returns LZMA_OK. This doesn't
really matter with the current single-threaded .xz decoder
as the errors would be detected slightly later anyway. But
this helps avoiding corner cases in the upcoming threaded
decompressor, and it might help other Block decoder uses
outside liblzma too.

The test files bad-1-lzma2-{9,10,11}.xz test these conditions.
With the single-threaded .xz decoder the only difference is
that LZMA_DATA_ERROR is detected in a difference place now.
2022-02-20 20:36:27 +02:00
jiat75 6468f7e41a liblzma: Add NULL checks to LZMA and LZMA2 properties encoders.
Previously lzma_lzma_props_encode() and lzma_lzma2_props_encode()
assumed that the options pointers must be non-NULL because the
with these filters the API says it must never be NULL. It is
good to do these checks anyway.
2022-02-07 00:20:01 +02:00
Lasse Collin 2523c30705 liblzma: Fix uint64_t vs. size_t confusion.
This broke 32-bit builds due to a pointer type mismatch.

This bug was introduced with the output-size-limited encoding
in 625f4c7c99.

Thanks to huangqinjin for the bug report.
2022-02-06 23:19:32 +02:00
Lasse Collin 2024fbf279 xzgrep: Update man page timestamp. 2021-11-13 21:04:05 +02:00
Ville Skyttä 3a512c7787 xzgrep: use `grep -E/-F` instead of `egrep` and `fgrep`
`egrep` and `fgrep` have been deprecated in GNU grep since 2007, and in
current post 3.7 Git they have been made to emit obsolescence warnings:
https://git.savannah.gnu.org/cgit/grep.git/commit/?id=a9515624709865d480e3142fd959bccd1c9372d1
2021-11-13 18:17:33 +02:00
Lasse Collin edf525e2b1 Bump the version number for 5.3.2alpha. 2021-10-28 23:02:11 +03:00
Lasse Collin f2aea1d5a5 xz: Change the coding style of the previous commit.
It isn't any better now but it's consistent with
the rest of the code base.
2021-10-27 23:23:11 +03:00
Alexander Bluhm 892b16cc28 xz: Avoid fchown(2) failure.
OpenBSD does not allow to change the group of a file if the user
does not belong to this group.  In contrast to Linux, OpenBSD also
fails if the new group is the same as the old one.  Do not call
fchown(2) in this case, it would change nothing anyway.

This fixes an issue with Perl Alien::Build module.
https://github.com/PerlAlien/Alien-Build/issues/62
2021-10-27 20:49:41 +03:00
Lasse Collin 2b509c868c liblzma: Fix liblzma.map for the lzma_microlzma_* symbols.
This should have been part of d267d109c3.

Thanks to Gao Xiang.
2021-09-17 17:31:11 +03:00
Lasse Collin 6928aac9da liblzma: Use _MSVC_LANG to detect when "noexcept" can be used with MSVC.
By default, MSVC always sets __cplusplus to 199711L. The real
C++ standard version is available in _MSVC_LANG (or one could
use /Zc:__cplusplus to set __cplusplus correctly).

Fixes <https://sourceforge.net/p/lzmautils/discussion/708858/thread/f6bc3b108a/>.

Thanks to Dan Weiss.
2021-09-09 21:41:51 +03:00
Lasse Collin d267d109c3 liblzma: Rename EROFS LZMA to MicroLZMA.
It still exists primarily for EROFS but MicroLZMA is
a more generic name (that hopefully doesn't clash with
something that already exists).
2021-09-05 20:38:12 +03:00
Lasse Collin 3247e95115 xzdiff: Update the man page about the exit status.
This was forgotten from 194029ffaf.
2021-06-04 19:02:38 +03:00
Lasse Collin 96f5a28a46 xzless: Fix less(1) version detection when it contains a dot.
Sometimes the version number from "less -V" contains a dot,
sometimes not. xzless failed detect the version number when
it does contain a dot. This fixes it.

Thanks to nick87720z for reporting this. Apparently it had been
reported here <https://bugs.gentoo.org/489362> in 2013.
2021-06-04 18:52:48 +03:00
Ivan A. Melnikov fc3d3a7296 Reduce maximum possible memory limit on MIPS32
Due to architectural limitations, address space available to a single
userspace process on MIPS32 is limited to 2 GiB, not 4, even on systems
that have more physical RAM -- e.g. 64-bit systems with 32-bit
userspace, or systems that use XPA (an extension similar to x86's PAE).

So, for MIPS32, we have to impose stronger memory limits. I've chosen
2000MiB to give the process some headroom.
2021-04-11 19:50:41 +03:00
Lasse Collin 6c6f0db340 liblzma: Fix unitialized variable.
This was introduced two weeks ago in the commit
625f4c7c99.

Thanks to Nathan Moinvaziri.
2021-01-29 21:19:08 +02:00
Lasse Collin 6b8abc84a5 liblzma: Fix a wrong comment in stream_encoder_mt.c. 2021-01-24 19:22:35 +02:00
Lasse Collin db465419ae liblzma: In EROFS LZMA decoder, verify that comp_size matches at the end.
When the uncompressed size is known to be exact, after decompressing
the stream exactly comp_size bytes of input must have been consumed.
This is a minor improvement to error detection.
2021-01-17 19:20:50 +02:00
Lasse Collin 774cc0118b liblzma: Make EROFS LZMA decoder work when exact uncomp_size isn't known.
The caller must still not specify an uncompressed size bigger
than the actual uncompressed size.

As a downside, this now needs the exact compressed size.
2021-01-17 18:53:34 +02:00
Lasse Collin 421b0aa352 liblzma: Fix missing normalization in rc_encode_dummy().
Without this fix it could attempt to create too much output.
2021-01-14 20:57:11 +02:00
Lasse Collin 601ec0311e liblzma: Add EROFS LZMA encoder and decoder.
Right now this is just a planned extra-compact format for use
in the EROFS file system in Linux. At this point it's possible
that the format will either change or be abandoned and removed
completely.

The special thing about the encoder is that it uses the
output-size-limited encoding added in the previous commit.
EROFS uses fixed-sized blocks (e.g. 4 KiB) to hold compressed
data so the compressors must be able to create valid streams
that fill the given block size.
2021-01-14 20:10:59 +02:00
Lasse Collin 625f4c7c99 liblzma: Add rough support for output-size-limited encoding in LZMA1.
With this it is possible to encode LZMA1 data without EOPM so that
the encoder will encode as much input as it can without exceeding
the specified output size limit. The resulting LZMA1 stream will
be a normal LZMA1 stream without EOPM. The actual uncompressed size
will be available to the caller via the uncomp_size pointer.

One missing thing is that the LZMA layer doesn't inform the LZ layer
when the encoding is finished and thus the LZ may read more input
when it won't be used. However, this doesn't matter if encoding is
done with a single call (which is the planned use case for now).
For proper multi-call encoding this should be improved.

This commit only adds the functionality for internal use.
Nothing uses it yet.
2021-01-14 18:58:13 +02:00
Lasse Collin 9cdabbeea8 Scripts: Add zstd support to xzdiff. 2021-01-11 23:57:11 +02:00
Lasse Collin 074259f4f3 xz: Make --keep accept symlinks, hardlinks, and setuid/setgid/sticky.
Previously this required using --force but that has other
effects too which might be undesirable. Changing the behavior
of --keep has a small risk of breaking existing scripts but
since this is a fairly special corner case I expect the
likehood of breakage to be low enough.

I think the new behavior is more logical. The only reason for
the old behavior was to be consistent with gzip and bzip2.

Thanks to Vincent Lefevre and Sebastian Andrzej Siewior.
2021-01-11 23:41:16 +02:00
Lasse Collin 73c555b307 Scripts: Fix exit status of xzgrep.
Omit the -q option from xz, gzip, and bzip2. With xz this shouldn't
matter. With gzip it's important because -q makes gzip replace SIGPIPE
with exit status 2. With bzip2 it's important because with -q bzip2
is completely silent if input is corrupt while other decompressors
still give an error message.

Avoiding exit status 2 from gzip is important because bzip2 uses
exit status 2 to indicate corrupt input. Before this commit xzgrep
didn't recognize corrupt .bz2 files because xzgrep was treating
exit status 2 as SIGPIPE for gzip compatibility.

zstd still needs -q because otherwise it is noisy in normal
operation.

The code to detect real SIGPIPE didn't check if the exit status
was due to a signal (>= 128) and so could ignore some other exit
status too.
2021-01-11 23:28:52 +02:00
Lasse Collin 194029ffaf Scripts: Fix exit status of xzdiff/xzcmp.
This is a minor fix since this affects only the situation when
the files differ and the exit status is something else than 0.
In such case there could be SIGPIPE from a decompression tool
and that would result in exit status of 2 from xzdiff/xzcmp
while the correct behavior would be to return 1 or whatever
else diff or cmp may have returned.

This commit omits the -q option from xz/gzip/bzip2/lzop arguments.
I'm not sure why the -q was used in the first place, perhaps it
hides warnings in some situation that I cannot see at the moment.
Hopefully the removal won't introduce a new bug.

With gzip the -q option was harmful because it made gzip return 2
instead of >= 128 with SIGPIPE. Ignoring exit status 2 (warning
from gzip) isn't practical because bzip2 uses exit status 2 to
indicate corrupt input file. It's better if SIGPIPE results in
exit status >= 128.

With bzip2 the removal of -q seems to be good because with -q
it prints nothing if input is corrupt. The other tools aren't
silent in this situation even with -q. On the other hand, if
zstd support is added, it will need -q since otherwise it's
noisy in normal situations.

Thanks to Étienne Mollier and Sebastian Andrzej Siewior.
2021-01-11 22:58:58 +02:00
Lasse Collin f7fa309e1f liblzma: Make lzma_outq usable for threaded decompression too.
Before this commit all output queue buffers were allocated as
a single big allocation. Now each buffer is allocated separately
when needed. Used buffers are cached to avoid reallocation
overhead but the cache will keep only one buffer size at a time.
This should make things work OK in the decompression where most
of the time the buffer sizes will be the same but with some less
common files the buffer sizes may vary.

While this should work fine, it's still a bit preliminary
and may even get reverted if it turns out to be useless for
decompression.
2021-01-09 22:18:23 +02:00
H.J. Lu 4fd79b90c5 liblzma: Enable Intel CET in x86 CRC assembly codes
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and add _CET_ENDBR to indirect jump targets.

Tested on Intel Tiger Lake under CET enabled Linux.
2020-12-23 17:13:33 +02:00
Adam Borowski 1890351f34 Scripts: Add zstd support to xzgrep.
Thanks to Adam Borowski.
2020-12-05 22:39:03 +02:00
Lasse Collin 4575d9d365 xz: Avoid unneeded \f escapes on the man page.
I don't want to use \c in macro arguments but groff_man(7)
suggests that \f has better portability. \f would be needed
for the .TP strings for portability reasons anyway.

Thanks to Bjarni Ingi Gislason.
2020-11-01 22:34:25 +02:00
Lasse Collin 620b32f533 xz: Use non-breaking spaces when intentionally using more than one space.
This silences some style checker warnings. Seems that spaces
in the beginning of a line don't need this treatment.

Thanks to Bjarni Ingi Gislason.
2020-11-01 19:09:53 +02:00
Lasse Collin cb1f34988c xz: Protect the ellipsis (...) on the man page with \&.
This does it only when ... appears outside macro calls.

Thanks to Bjarni Ingi Gislason.
2020-11-01 18:53:25 +02:00
Lasse Collin 5d224da3da xz: Avoid the abbreviation "e.g." on the man page.
A few are simply omitted, most are converted to "for example"
and surrounded with commas. Sounds like that this is better
style, for example, man-pages(7) recommends avoiding such
abbreviations except in parenthesis.

Thanks to Bjarni Ingi Gislason.
2020-11-01 18:44:51 +02:00
Lasse Collin 90457dbe3e xz man page: Change \- (minus) to \(en (en-dash) for a numeric range.
Docs of ancient troff/nroff mention \(em (em-dash) but not \(en
and \- was used for both minus and en-dash. I don't know how
portable \(en is nowadays but it can be changed back if someone
complains. At least GNU groff and OpenBSD's mandoc support it.

Thanks to Bjarni Ingi Gislason for the patch.
2020-07-12 23:10:03 +03:00
Lasse Collin 352ba2d69a Windows: Fix building of resource files when config.h isn't used.
Now CMake + Visual Studio works for building liblzma.dll.

Thanks to Markus Rickert.
2020-07-12 20:46:24 +03:00
Lasse Collin a9e2a87f1d src/scripts/xzgrep.1: Filenames to xzgrep are optional.
xzgrep --help was correct already.
2020-04-06 19:34:48 +03:00
Bjarni Ingi Gislason a7ba275d9b src/script/xzgrep.1: Remove superfluous '.RB'
Output is from: test-groff -b -e -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

Input file is ./src/scripts/xzgrep.1

<src/scripts/xzgrep.1>:20 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:23 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:26 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:29 (macro RB): only 1 argument, but more are expected
<src/scripts/xzgrep.1>:32 (macro RB): only 1 argument, but more are expected

 "abc..." does not mean the same as "abc ...".

  The output from nroff and troff is unchanged except for the space
between "file" and "...".

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-04-06 19:29:15 +03:00
Bjarni Ingi Gislason 133d498db0 xzgrep.1: Delete superfluous '.PP'
Summary:

mandoc -T lint xzgrep.1 :
mandoc: xzgrep.1:79:2: WARNING: skipping paragraph macro: PP empty

  There is no change in the output of "nroff" and "troff".

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-04-06 19:08:14 +03:00
Bjarni Ingi Gislason 057839ca98 src/xz/xz.1: Correct misused two-fonts macros
Output is from: test-groff -b -e -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

Input file is ./src/xz/xz.1

<src/xz/xz.1>:408 (macro BR): only 1 argument, but more are expected
<src/xz/xz.1>:1009 (macro BR): only 1 argument, but more are expected
<src/xz/xz.1>:1743 (macro BR): only 1 argument, but more are expected
<src/xz/xz.1>:1920 (macro BR): only 1 argument, but more are expected
<src/xz/xz.1>:2213 (macro BR): only 1 argument, but more are expected

  Output from nroff and troff is unchanged, except for a font change of a
full stop (.).

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-04-06 19:08:04 +03:00
Lasse Collin b8e12f5ab4 Typo fixes from fossies.org.
https://fossies.org/linux/misc/xz-5.2.5.tar.xz/codespell.html
2020-03-23 18:07:50 +02:00
Lasse Collin 7812002dd3 xz: Never use thousand separators in DJGPP builds.
DJGPP 2.05 added support for thousands separators but it's
broken at least under WinXP with Finnish locale that uses
a non-breaking space as the thousands separator. Workaround
by disabling thousands separators for DJGPP builds.
2020-03-11 21:15:35 +02:00
Lasse Collin 4572d53e16 liblzma: Fix a comment and RC_SYMBOLS_MAX.
The comment didn't match the value of RC_SYMBOLS_MAX and the value
itself was slightly larger than actually needed. The only harm
about this was that memory usage was a few bytes larger.
2020-03-02 13:54:33 +02:00
Lasse Collin b3ed19a55f liblzma: Remove unneeded <sys/types.h> from fastpos_tablegen.c.
This file only generates fastpos_table.c.
It isn't built as a part of liblzma.
2020-02-24 23:23:18 +02:00
Lasse Collin 7b8982b291 Use defined(__GNUC__) before __GNUC__ in preprocessor lines.
This should silence the equivalent of -Wundef in compilers that
don't define __GNUC__.
2020-02-22 14:15:07 +02:00
Lasse Collin 43dfe04e62 liblzma: Add more uses of lzma_memcmplen() to the normal mode of LZMA.
This gives a tiny encoder speed improvement. This could have been done
in 2014 after the commit 544aaa3d13 but
it was forgotten.
2020-02-21 17:40:02 +02:00
Lasse Collin 7fe3ef2eaa xz: Silence a warning when sig_atomic_t is long int.
It can be true at least on z/OS.
2020-02-21 16:10:44 +02:00
Lasse Collin b0a2a77d10 xz: Avoid unneeded access of a volatile variable. 2020-02-21 15:59:26 +02:00
Lasse Collin 57360bb4fd tuklib_exit: Add missing header.
strerror() needs <string.h> which happened to be included via
tuklib_common.h -> tuklib_config.h -> sysdefs.h if HAVE_CONFIG_H
was defined. This wasn't tested without config.h before so it
had worked fine.
2020-02-20 18:54:04 +02:00
Lasse Collin fddd31175e Revert the previous commit and add a comment.
The previous commit broke crc32_tablegen.c.

If the whole package is built without config.h (with defines
set on the compiler command line) this should still work fine
as long as these headers conform to C99 well enough.
2020-02-18 19:12:35 +02:00
Lasse Collin 4e4e9fbb7e Do not check for HAVE_CONFIG_H in tuklib_config.h.
In XZ Utils sysdefs.h takes care of it and the required headers.
2020-02-17 23:37:20 +02:00
Lasse Collin 2d4cef954f sysdefs.h: Omit the conditionals around string.h and limits.h.
string.h is used unconditionally elsewhere in the project and
configure has always stopped if limits.h is missing, so these
headers must have been always available even on the weirdest
systems.
2020-02-16 12:24:13 +02:00
Lasse Collin 6f7211b6bb Build: Add support for translated man pages using po4a.
The dependency on po4a is optional. It's never required to install
the translated man pages when xz is built from a release tarball.
If po4a is missing when building from xz.git, the translated man
pages won't be generated but otherwise the build will work normally.

The translations are only updated automatically by autogen.sh and
by "make mydist". This makes it easy to keep po4a as an optional
dependency and ensures that I won't forget to put updated
translations to a release tarball.

The translated man pages aren't installed if --disable-nls is used.

The installation of translated man pages abuses Automake internals
by calling "install-man" with redefined dist_man_MANS and man_MANS.
This makes the hairy script code slightly less hairy. If it breaks
some day, this code needs to be fixed; don't blame Automake developers.

Also, this adds more quotes to the existing shell script code in
the Makefile.am "-hook"s.
2020-02-07 15:32:21 +02:00
Lasse Collin 15a133b6d1 xz: Make it a fatal error if enabling the sandbox fails.
Perhaps it's too drastic but on the other hand it will let me
learn about possible problems if people report the errors.
This won't be backported to the v5.2 branch.
2020-02-05 20:40:14 +02:00
Lasse Collin af0fb386ef xz: Comment out annoying sandboxing messages. 2020-02-05 20:33:50 +02:00
Lasse Collin 3539705108 xz: Limit --memlimit-compress to at most 4020 MiB for 32-bit xz.
See the code comment for reasoning. It's far from perfect but
hopefully good enough for certain cases while hopefully doing
nothing bad in other situations.

At presets -5 ... -9, 4020 MiB vs. 4096 MiB makes no difference
on how xz scales down the number of threads.

The limit has to be a few MiB below 4096 MiB because otherwise
things like "xz --lzma2=dict=500MiB" won't scale down the dict
size enough and xz cannot allocate enough memory. With
"ulimit -v $((4096 * 1024))" on x86-64, the limit in xz had
to be no more than 4085 MiB. Some safety margin is good though.

This is hack but it should be useful when running 32-bit xz on
a 64-bit kernel that gives full 4 GiB address space to xz.
Hopefully this is enough to solve this:

https://bugzilla.redhat.com/show_bug.cgi?id=1196786

FreeBSD has a patch that limits the result in tuklib_physmem()
to SIZE_MAX on 32-bit systems. While I think it's not the way
to do it, the results on --memlimit-compress have been good. This
commit should achieve practically identical results for compression
while leaving decompression and tuklib_physmem() and thus
lzma_physmem() unaffected.
2020-02-01 19:56:18 +02:00
Lasse Collin ba76d67585 xz: Set the --flush-timeout deadline when the first input byte arrives.
xz --flush-timeout=2000, old version:

  1. xz is started. The next flush will happen after two seconds.
  2. No input for one second.
  3. A burst of a few kilobytes of input.
  4. No input for one second.
  5. Two seconds have passed and flushing starts.

The first second counted towards the flush-timeout even though
there was no pending data. This can cause flushing to occur more
often than needed.

xz --flush-timeout=2000, after this commit:

  1. xz is started.
  2. No input for one second.
  3. A burst of a few kilobytes of input. The next flush will
     happen after two seconds counted from the time when the
     first bytes of the burst were read.
  4. No input for one second.
  5. No input for another second.
  6. Two seconds have passed and flushing starts.
2020-01-26 20:53:25 +02:00
Lasse Collin fd47fd62bb xz: Move flush_needed from mytime.h to file_pair struct in file_io.h. 2020-01-26 20:25:52 +02:00
Lasse Collin 8150356810 xz: coder.c: Make writing output a separate function.
The same code sequence repeats so it's nicer as a separate function.
Note that in one case there was no test for opt_mode != MODE_TEST,
but that was only because that condition would always be true, so
this commit doesn't change the behavior there.
2020-01-26 14:49:22 +02:00
Lasse Collin 5a49e081a0 xz: Fix semi-busy-waiting in xz --flush-timeout.
When input blocked, xz --flush-timeout=1 would wake up every
millisecond and initiate flushing which would have nothing to
flush and thus would just waste CPU time. The fix disables the
timeout when no input has been seen since the previous flush.
2020-01-26 14:13:42 +02:00
Lasse Collin dcca70fe9f xz: Refactor io_read() a bit. 2020-01-26 13:47:31 +02:00
Lasse Collin 4ae9ab70cd xz: Update a comment in file_io.h. 2020-01-26 13:37:08 +02:00
Lasse Collin 3333ba4a67 xz: Move the setting of flush_needed in file_io.c to a nicer location. 2020-01-26 13:27:51 +02:00
Lasse Collin 7136f1735c Rename unaligned_read32ne to read32ne, and similarly for the others. 2019-12-31 00:47:49 +02:00
Lasse Collin 5e78fcbf2e Rename read32ne to aligned_read32ne, and similarly for the others.
Using the aligned methods requires more care to ensure that
the address really is aligned, so it's nicer if the aligned
methods are prefixed. The next commit will remove the unaligned_
prefix from the unaligned methods which in liblzma are used in
more places than the aligned ones.
2019-12-31 00:29:48 +02:00
Lasse Collin 77bc5bc6dd Revise tuklib_integer.h and .m4.
Add a configure option --enable-unsafe-type-punning to get the
old non-conforming memory access methods. It can be useful with
old compilers or in some other less typical situations but
shouldn't normally be used.

Omit the packed struct trick for unaligned access. While it's
best in some cases, this is simpler. If the memcpy trick doesn't
work, one can request unsafe type punning from configure.

Because CRC32/CRC64 code needs fast aligned reads, if no very
safe way to do it is found, type punning is used as a fallback.
This sucks but since it currently works in practice, it seems to
be the least bad option. It's never needed with GCC >= 4.7 or
Clang >= 3.6 since these support __builtin_assume_aligned and
thus fast aligned access can be done with the memcpy trick.

Other things:
  - Support GCC/Clang __builtin_bswapXX
  - Cleaner bswap fallback macros
  - Minor cleanups
2019-12-31 00:18:24 +02:00
Lasse Collin 43ce4ea7c7 Scripts: Put /usr/xpg4/bin to the beginning of PATH on Solaris.
This adds a configure option --enable-path-for-scripts=PREFIX
which defaults to empty except on Solaris it is /usr/xpg4/bin
to make POSIX grep and others available. The Solaris case had
been documented in INSTALL with a manual fix but it's better
to do this automatically since it is needed on most Solaris
systems anyway.

Thanks to Daniel Richard G.
2019-09-24 23:02:40 +03:00
Lasse Collin 6a89e656eb Fix comment typos in tuklib_mbstr* files. 2019-07-12 18:57:43 +03:00
Lasse Collin ac0b421265 Add missing include to tuklib_mbstr_width.c.
It didn't matter in XZ Utils because sysdefs.h
includes string.h anyway.
2019-07-12 18:30:46 +03:00
Lasse Collin 72a443281f Update tuklib base headers to include stdbool.h. 2019-07-12 18:10:57 +03:00
Lasse Collin de1f47b2b4 xz: Automatically align the strings in --info-memory.
This makes it easier to translate the strings.

Also, the string for amount of RAM was shortened.
2019-06-28 00:54:31 +03:00
Lasse Collin 8ce679125d liblzma: Fix a buggy comment. 2019-06-25 23:15:21 +03:00
Lasse Collin d499e467d9 liblzma: Add a comment. 2019-06-24 23:52:17 +03:00
Lasse Collin a12b13c5f0 liblzma: Silence clang -Wmissing-variable-declarations. 2019-06-24 23:45:21 +03:00
Lasse Collin 1b4675cebf Add LZMA_RET_INTERNAL1..8 to lzma_ret and use one for LZMA_TIMED_OUT.
LZMA_TIMED_OUT is *internally* used as a value for lzma_ret
enumeration. Previously it was #defined to 32 and cast to lzma_ret.
That way it wasn't visible in the public API, but this was hackish.

Now the public API has eight LZMA_RET_INTERNALx members and
LZMA_TIMED_OUT is #defined to LZMA_RET_INTERNAL1. This way
the code is cleaner overall although the public API has a few
extra mysterious enum members.
2019-06-24 23:25:41 +03:00
Lasse Collin 159c43875e xz: Silence a warning from clang -Wsign-conversion in main.c. 2019-06-24 22:57:43 +03:00
Lasse Collin 466cfcd3e5 xz: Make "headings" static in list.c.
Caught by clang -Wmissing-variable-declarations.
2019-06-24 22:52:20 +03:00
Lasse Collin 608517b9b7 liblzma: Remove incorrect uses of lzma_attribute((__unused__)).
Caught by clang -Wused-but-marked-unused.
2019-06-24 22:50:36 +03:00
Lasse Collin 2402f7873d xz: Fix an integer overflow with 32-bit off_t.
Or any off_t which isn't very big (like signed 64 bit integer
that most system have). A small off_t could overflow if the
file being decompressed had long enough run of zero bytes,
which would result in corrupt output.
2019-06-24 20:45:49 +03:00
Lasse Collin 4fd3a8dd0b xz: Cleanup io_seek_src() a bit.
lseek() returns -1 on error and checking for -1 is nicer.
2019-06-24 01:24:17 +03:00
Lasse Collin 1d4a904d8f xz: Change io_seek_src and io_pread arguments from off_t to uint64_t.
This helps fixing warnings from -Wsign-conversion and makes the
code look better too.
2019-06-24 00:40:45 +03:00
Lasse Collin 50120deb01 xz: list.c: Fix some warnings from -Wsign-conversion. 2019-06-24 00:12:38 +03:00
Lasse Collin d0a78751eb tuklib_mbstr_width: Fix a warning from -Wsign-conversion. 2019-06-23 23:22:45 +03:00
Lasse Collin 7883d73530 xz: Fix some of the warnings from -Wsign-conversion. 2019-06-23 23:19:34 +03:00
Lasse Collin c2b994fe3d tuklib_cpucores: Silence warnings from -Wsign-conversion. 2019-06-23 22:27:45 +03:00
Lasse Collin 07c4fa9e1a xzdec: Fix warnings from -Wsign-conversion. 2019-06-23 21:40:47 +03:00
Lasse Collin dfac2c9a1d liblzma: Fix warnings from -Wsign-conversion.
Also, more parentheses were added to the literal_subcoder
macro in lzma_comon.h (better style but no functional change
in the current usage).
2019-06-23 21:38:56 +03:00
Lasse Collin 41838dcc26 tuklib_integer: Silence warnings from -Wsign-conversion. 2019-06-23 19:33:55 +03:00
Lasse Collin 3ce05d235f tuklib_integer: Fix usage of conv macros.
Use a temporary variable instead of e.g.
conv32le(unaligned_read32ne(buf)) because the macro can
evaluate its argument multiple times.
2019-06-20 19:40:30 +03:00
Lasse Collin 039a168e8c liblzma: Fix comments.
Thanks to Bruce Stark.
2019-06-03 20:41:54 +03:00
Lasse Collin c460f6defe liblzma: Fix one more unaligned read to use unaligned_read16ne(). 2019-06-02 00:50:59 +03:00
Lasse Collin 386394fc9f liblzma: memcmplen: Use ctz32() from tuklib_integer.h.
The same compiler-specific #ifdefs are already in tuklib_integer.h
2019-06-01 21:36:13 +03:00
Lasse Collin 264ab971ce tuklib_integer: Cleanup MSVC-specific code. 2019-06-01 21:30:03 +03:00
Lasse Collin 33773c6f2a liblzma: Use unaligned_readXXne functions instead of type punning.
Now gcc -fsanitize=undefined should be clean.

Thanks to Jeffrey Walton.
2019-06-01 19:01:21 +03:00
Lasse Collin 3bc112c2d3 tuklib_integer: Improve unaligned memory access.
Now memcpy() or GNU C packed structs for unaligned access instead
of type punning. See the comment in this commit for details.

Avoiding type punning with unaligned access is needed to
silence gcc -fsanitize=undefined.

New functions: unaliged_readXXne and unaligned_writeXXne where
XX is 16, 32, or 64.
2019-06-01 18:41:16 +03:00
Lasse Collin 2a22de439e liblzma: Avoid memcpy(NULL, foo, 0) because it is undefined behavior.
I should have always known this but I didn't. Here is an example
as a reminder to myself:

    int mycopy(void *dest, void *src, size_t n)
    {
        memcpy(dest, src, n);
        return dest == NULL;
    }

In the example, a compiler may assume that dest != NULL because
passing NULL to memcpy() would be undefined behavior. Testing
with GCC 8.2.1, mycopy(NULL, NULL, 0) returns 1 with -O0 and -O1.
With -O2 the return value is 0 because the compiler infers that
dest cannot be NULL because it was already used with memcpy()
and thus the test for NULL gets optimized out.

In liblzma, if a null-pointer was passed to memcpy(), there were
no checks for NULL *after* the memcpy() call, so I cautiously
suspect that it shouldn't have caused bad behavior in practice,
but it's hard to be sure, and the problematic cases had to be
fixed anyway.

Thanks to Jeffrey Walton.
2019-05-13 20:05:17 +03:00
Lasse Collin 4adb8288ab xz: Update xz man page date. 2019-05-11 20:54:12 +03:00
Antoine Cœur 2fb0ddaa55 spelling 2019-05-11 20:52:37 +03:00
Lasse Collin 4ed3396061 xz: In xz -lvv look at the widths of the check names too.
Now the widths of the check names is used to adjust the width
of the Check column. This way there no longer is a need to restrict
the widths of the check names to be at most ten terminal-columns.
2019-05-01 18:43:10 +03:00
Lasse Collin 2f4281a100 xz: Fix xz -lvv column alignment to look at the translated strings. 2019-05-01 18:33:25 +03:00
Lasse Collin a750c35a7d xz: Automatically align column headings in xz -lvv. 2019-03-04 21:20:39 +02:00
Lasse Collin 6cb42e8aa1 xz: Automatically align strings ending in a colon in --list output.
This should avoid alignment errors in translations with these
strings.
2019-03-04 21:16:59 +02:00
Lasse Collin b55d79461d xz: Fix a crash in progress indicator when in passthru mode.
"xz -dcfv not_an_xz_file" crashed (all four options are
required to trigger it). It caused xz to call
lzma_get_progress(&strm, ...) when no coder was initialized
in strm. In this situation strm.internal is NULL which leads
to a crash in lzma_get_progress().

The bug was introduced when xz started using lzma_get_progress()
to get progress info for multi-threaded compression, so the
bug is present in versions 5.1.3alpha and higher.

Thanks to Filip Palian <Filip.Palian@pjwstk.edu.pl> for
the bug report.
2018-12-20 20:39:20 +02:00
Lasse Collin 4ae5526de0 xz: Update man page timestamp. 2018-11-22 17:20:31 +02:00
Pavel Raiskup 6a36d0d5f4 'have have' typos 2018-11-22 17:19:09 +02:00
Lasse Collin a18ae42a79 liblzma: Don't verify header CRC32s if building for fuzz testing.
FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION is #defined when liblzma
is being built for fuzz testing.

Most fuzzed inputs would normally get rejected because of incorrect
CRC32 and the actual header decoding code wouldn't get fuzzed.
Disabling CRC32 checks avoids this problem. The fuzzer program
must still use LZMA_IGNORE_CHECK flag to disable verification of
integrity checks of uncompressed data.
2018-10-26 22:49:10 +03:00
Lasse Collin f76f7516d6 xzless: Rename unused variables to silence static analysers.
In this particular case I don't see this affecting readability
of the code.

Thanks to Pavel Raiskup.
2018-07-27 18:10:44 +03:00
Lasse Collin 3cbcaeb07e liblzma: Remove an always-true condition from lzma_index_cat().
This should help static analysis tools to see that newg
isn't leaked.

Thanks to Pavel Raiskup.
2018-07-27 16:02:58 +03:00
Lasse Collin 76762ae609 liblzma: Improve lzma_properties_decode() API documentation. 2018-05-19 21:23:25 +03:00
Lasse Collin 2267f5b0d2 Bump the version number to 5.3.1alpha. 2018-04-29 18:58:19 +03:00
Ben Boeckel bc19799169 nothrow: use noexcept for C++11 and newer
In C++11, the `throw()` specifier is deprecated and `noexcept` is
preffered instead.
2018-02-06 18:41:45 +02:00
Lasse Collin fb6d4f83cb liblzma: Remove incorrect #ifdef from range_common.h.
In most cases it was harmless but it could affect some
custom build systems.

Thanks to Pippijn van Steenhoven.
2018-02-06 18:02:48 +02:00
Lasse Collin 713bbc1a80 tuklib_integer: New Intel C compiler needs immintrin.h.
Thanks to Melanie Blower (Intel) for the patch.
2018-01-10 21:54:27 +02:00
Lasse Collin 94e3f986aa Fix or hide warnings from GCC 7's -Wimplicit-fallthrough. 2017-08-14 20:08:33 +03:00
Lasse Collin a015cd1f90 xz: Fix "xz --list --robot missing_or_bad_file.xz".
It ended up printing an uninitialized char-array when trying to
print the check names (column 7) on the "totals" line.

This also changes the column 12 (minimum xz version) to
50000002 (xz 5.0.0) instead of 0 when there are no valid
input files.

Thanks to kidmin for the bug report.
2017-05-23 18:34:43 +03:00
Lasse Collin 8269782283 xz: Use lzma_file_info_decoder() for --list. 2017-04-24 19:48:23 +03:00
Lasse Collin e353d0b1cc liblzma: Add lzma_file_info_decoder(). 2017-04-24 19:48:04 +03:00
Lasse Collin 8c9842c265 liblzma: Rename LZMA_SEEK to LZMA_SEEK_NEEDED and seek_in to seek_pos. 2017-04-21 15:05:16 +03:00
Lasse Collin 662b27c417 Update the home page URLs to HTTPS. 2017-04-19 22:17:35 +03:00
Lasse Collin c28f0b3d00 xz: Add io_seek_src(). 2017-04-05 18:47:22 +03:00
Lasse Collin bba477257d xz: Use POSIX_FADV_RANDOM for in "xz --list" mode.
xz --list is random access so POSIX_FADV_SEQUENTIAL was clearly
wrong.
2017-03-30 22:01:54 +03:00
Lasse Collin 310d19816d liblzma: Make lzma_index_decoder_init() visible to other liblzma funcs.
This is to allow other functions to use it without going
via the public API (lzma_index_decoder()).
2017-03-30 20:03:05 +03:00
Lasse Collin a27920002d liblzma: Add generic support for input seeking (LZMA_SEEK).
Also mention LZMA_SEEK in xz/message.c to silence a warning.
2017-03-30 20:00:09 +03:00
Lasse Collin a0b1dda409 liblzma: Fix lzma_memlimit_set(strm, 0).
The 0 got treated specially in a buggy way and as a result
the function did nothing. The API doc said that 0 was supposed
to return LZMA_PROG_ERROR but it didn't.

Now 0 is treated as if 1 had been specified. This is done because
0 is already used to indicate an error from lzma_memlimit_get()
and lzma_memusage().

In addition, lzma_memlimit_set() no longer checks that the new
limit is at least LZMA_MEMUSAGE_BASE. It's counter-productive
for the Index decoder and was actually needed only by the
auto decoder. Auto decoder has now been modified to check for
LZMA_MEMUSAGE_BASE.
2017-03-30 19:51:14 +03:00
Lasse Collin 84462afaad liblzma: Similar memlimit fix for stream_, alone_, and auto_decoder. 2017-03-30 19:16:55 +03:00
Lasse Collin cbc7401793 liblzma: Fix handling of memlimit == 0 in lzma_index_decoder().
It returned LZMA_PROG_ERROR, which was done to avoid zero as
the limit (because it's a special value elsewhere), but using
LZMA_PROG_ERROR is simply inconvenient and can cause bugs.

The fix/workaround is to treat 0 as if it were 1 byte. It's
effectively the same thing. The only weird consequence is
that then lzma_memlimit_get() will return 1 even when 0 was
specified as the limit.

This fixes a very rare corner case in xz --list where a specific
memory usage limit and a multi-stream file could print the
error message "Internal error (bug)" instead of saying that
the memory usage limit is too low.
2017-03-30 19:10:55 +03:00
Lasse Collin d4a0462abe liblzma: Avoid multiple definitions of lzma_coder structures.
Only one definition was visible in a translation unit.
It avoided a few casts and temp variables but seems that
this hack doesn't work with link-time optimizations in compilers
as it's not C99/C11 compliant.

Fixes:
http://www.mail-archive.com/xz-devel@tukaani.org/msg00279.html
2016-11-21 20:24:50 +02:00
Lasse Collin df8f446e3a tuklib_cpucores: Add support for sched_getaffinity().
It's available in glibc (GNU/Linux, GNU/kFreeBSD). It's better
than sysconf(_SC_NPROCESSORS_ONLN) because sched_getaffinity()
gives the number of cores available to the process instead of
the total number of cores online.

As a side effect, this commit fixes a bug on GNU/kFreeBSD where
configure would detect the FreeBSD-specific cpuset_getaffinity()
but it wouldn't actually work because on GNU/kFreeBSD it requires
using -lfreebsd-glue when linking. Now the glibc-specific function
will be used instead.

Thanks to Sebastian Andrzej Siewior for the original patch
and testing.
2016-10-24 18:51:36 +03:00
Lasse Collin 446e4318fa xz: Fix copying of timestamps on Windows.
xz used to call utime() on Windows, but its result gets lost
on close(). Using _futime() seems to work.

Thanks to Martok for reporting the bug:
http://www.mail-archive.com/xz-devel@tukaani.org/msg00261.html
2016-06-30 20:27:36 +03:00
Lasse Collin 1b0ac0c53c xz: Silence warnings from -Wlogical-op.
Thanks to Evan Nemerson.
2016-06-16 22:46:02 +03:00
Lasse Collin c83b7a0334 Build: Fix = to += for xz_SOURCES in src/xz/Makefile.am.
Thanks to Christian Kujau.
2016-04-10 20:55:49 +03:00
Lasse Collin ac398c3baf liblzma: Disable external SHA-256 by default.
This is the sane thing to do. The conflict with OpenSSL
on some OSes and especially that the OS-provided versions
can be significantly slower makes it clear that it was
a mistake to have the external SHA-256 support enabled by
default.

Those who want it can now pass --enable-external-sha256 to
configure. INSTALL was updated with notes about OSes where
this can be a bad idea.

The SHA-256 detection code in configure.ac had some bugs that
could lead to a build failure in some situations. These were
fixed, although it doesn't matter that much now that the
external SHA-256 is disabled by default.

MINIX >= 3.2.0 uses NetBSD's libc and thus has SHA256_Init
in libc instead of libutil. Support for the libutil version
was removed.
2016-03-13 20:21:49 +02:00
Lasse Collin faf302137e tuklib_physmem: Hopefully silence a warning on Windows. 2015-11-08 20:16:10 +02:00
Lasse Collin 14115f84a3 liblzma: Make Valgrind happier with optimized (gcc -O2) liblzma.
When optimizing, GCC can reorder code so that an uninitialized
value gets used in a comparison, which makes Valgrind unhappy.
It doesn't happen when compiled with -O0, which I tend to use
when running Valgrind.

Thanks to Rich Prohaska. I remember this being mentioned long
ago by someone else but nothing was done back then.
2015-11-04 23:14:00 +02:00
Lasse Collin f4c95ba94b liblzma: Rename lzma_presets.c back to lzma_encoder_presets.c.
It would be too annoying to update other build systems
just because of this.
2015-11-03 20:55:45 +02:00
Lasse Collin cb3111e3ed xz: Make xz buildable even when encoders or decoders are disabled.
The patch is quite long but it's mostly about adding new #ifdefs
to omit code when encoders or decoders have been disabled.

This adds two new #defines to config.h: HAVE_ENCODERS and
HAVE_DECODERS.
2015-11-03 20:29:33 +02:00
Lasse Collin 4cc584985c Build: Build LZMA1/2 presets also when only decoder is wanted.
People shouldn't rely on the presets when decoding raw streams,
but xz uses the presets as the starting point for raw decoder
options anyway.

lzma_encocder_presets.c was renamed to lzma_presets.c to
make it clear it's not used solely by the encoder code.
2015-11-03 18:06:40 +02:00
Lasse Collin b0bc3e0385 Build: Don't omit lzma_cputhreads() unless using --disable-threads.
Previously it was omitted if encoders were disabled
with --disable-encoders. It didn't make sense and
it also broke the build.
2015-11-03 17:41:54 +02:00
Lasse Collin c6bf438ab3 liblzma: Fix a build failure related to external SHA-256 support.
If an appropriate header and structure were found by configure,
but a library with a usable SHA-256 functions wasn't, the build
failed.
2015-11-02 18:16:51 +02:00
Lasse Collin e18adc56f2 xz: Always close the file before trying to delete it.
unlink() can return EBUSY in errno for open files on some
operating systems and file systems.
2015-11-02 15:19:10 +02:00
Lasse Collin 21515d79d7 liblzma: Fix lzma_index_dup() for empty Streams.
Stream Flags and Stream Padding weren't copied from
empty Streams.
2015-10-12 20:45:15 +03:00
Lasse Collin 09f395b6b3 liblzma: Add a note to index.c for those using static analyzers. 2015-10-12 20:31:44 +03:00
Lasse Collin 3bf857edfe liblzma: Fix a memory leak in error path of lzma_index_dup().
lzma_index_dup() calls index_dup_stream() which, in case of
an error, calls index_stream_end() to free memory allocated
by index_stream_init(). However, it illogically didn't
actually free the memory. To make it logical, the tree
handling code was modified a bit in addition to changing
index_stream_end().

Thanks to Evan Nemerson for the bug report.
2015-10-12 20:29:09 +03:00
Lasse Collin fbbb295a91 liblzma: A MSVC-specific hack isn't needed with MSVC 2013 and newer. 2015-07-12 20:48:19 +03:00
Lasse Collin 49c26920d6 xz: Document that threaded decompression hasn't been implemented yet. 2015-05-11 21:26:16 +03:00
Lasse Collin 6bd0349c58 Revert "xz: Use pipe2() if available."
This reverts commit 7a11c4a8e5.
It is a problem when libc has pipe2() but the kernel is too
old to have pipe2() and thus pipe2() fails. In xz it's pointless
to have a fallback for non-functioning pipe2(); it's better to
avoid pipe2() completely.

Thanks to Michael Fox for the bug report.
2015-04-20 20:17:48 +03:00
Lasse Collin fc0df0f8db xz: Fix the Capsicum rights on user_abort_pipe. 2015-04-01 14:45:25 +03:00
Lasse Collin 1238381143 xz: Add support for sandboxing with Capsicum.
The sandboxing is used conditionally as described in main.c.
This isn't optimal but it was much easier to implement than
a full sandboxing solution and it still covers the most common
use cases where xz is writing to standard output. This should
have practically no effect on performance even with small files
as fork() isn't needed.

C and locale libraries can open files as needed. This has been
fine in the past, but it's a problem with things like Capsicum.
io_sandbox_enter() tries to ensure that various locale-related
files have been loaded before cap_enter() is called, but it's
possible that there are other similar problems which haven't
been seen yet.

Currently Capsicum is available on FreeBSD 10 and later
and there is a port to Linux too.

Thanks to Loganaden Velvindron for help.
2015-03-31 22:19:34 +03:00
Lasse Collin 3717885f9e Bump version to 5.3.0alpha and soname to 5.3.99.
The idea of 99 is that it looks a bit weird in this context.
For new features there's no API/ABI stability in devel versions.
2015-03-30 22:44:02 +03:00
Lasse Collin 25263fd9e7 Fix the detection of installed RAM on QNX.
The earlier version compiled but didn't actually work
since sysconf(_SC_PHYS_PAGES) always fails (or so I was told).

Thanks to Ole André Vadla Ravnås for the patch and testing.
2015-03-29 22:13:48 +03:00
Lasse Collin e0ea6737b0 xz: size_t/uint32_t cleanup in options.c. 2015-03-07 22:05:57 +02:00
Lasse Collin 8bcca29a65 xz: Fix a comment and silence a warning in message.c. 2015-03-07 22:04:23 +02:00
Lasse Collin f243f5f44c liblzma: Silence more uint32_t vs. size_t warnings. 2015-03-07 22:01:00 +02:00
Lasse Collin 7f0a4c50f4 xz: Make arg_count an unsigned int to silence a warning.
Actually the value of arg_count cannot exceed INT_MAX
but it's nicer as an unsigned int.
2015-03-07 19:54:00 +02:00
Lasse Collin f6ec468015 liblzma: Fix a warning in index.c. 2015-03-07 19:33:17 +02:00
Lasse Collin dec11497a7 Bump version and soname for 5.2.1. 2015-02-26 16:53:44 +02:00
Lasse Collin 7a11c4a8e5 xz: Use pipe2() if available. 2015-02-22 19:38:48 +02:00
Lasse Collin 117d962685 liblzma: Fix a compression-ratio regression in LZMA1/2 in fast mode.
The bug was added in the commit
f48fce093b and thus
affected 5.1.4beta and 5.2.0. Luckily the bug cannot
cause data corruption or other nasty things.
2015-02-21 23:40:26 +02:00
Lasse Collin ae984e31c1 xz: Fix the fcntl() usage when creating a pipe for the self-pipe trick.
Now it reads the old flags instead of blindly setting O_NONBLOCK.
The old code may have worked correctly, but this is better.
2015-02-21 23:00:19 +02:00
Lasse Collin d935b0cdf3 tuklib_cpucores: Use cpuset_getaffinity() on FreeBSD if available.
In FreeBSD, cpuset_getaffinity() is the preferred way to get
the number of available cores.

Thanks to Rui Paulo for the patch. I edited it slightly, but
hopefully I didn't break anything.
2015-02-10 15:28:30 +02:00
Lasse Collin eb61bc58c2 xzdiff: Make the mktemp usage compatible with FreeBSD's mktemp.
Thanks to Rui Paulo for the fix.
2015-02-09 22:08:37 +02:00
Lasse Collin b9a5b6b7a2 Add a few casts to tuklib_integer.h to silence possible warnings.
I heard that Visual Studio 2013 gave warnings without the casts.

Thanks to Gabi Davar.
2015-02-03 21:45:53 +02:00
Lasse Collin c45757135f liblzma: Set LZMA_MEMCMPLEN_EXTRA depending on the compare method. 2015-01-26 21:24:39 +02:00
Lasse Collin fec88d41e6 liblzma: Silence harmless Valgrind errors.
Thanks to Torsten Rupp for reporting this. I had
forgotten to run Valgrind before the 5.2.0 release.
2015-01-26 20:39:28 +02:00
Lasse Collin a9b45badfe xz: Fix comments. 2015-01-09 21:50:19 +02:00
Lasse Collin 4170edc914 xz: Don't fail if stdout doesn't support O_NONBLOCK.
This is similar to the case with stdin.

Thanks to Brad Smith for the bug report and testing
on OpenBSD.
2015-01-09 21:34:06 +02:00
Lasse Collin 04bbc0c284 xz: Fix a memory leak in DOS-specific code. 2015-01-07 19:18:20 +02:00
Lasse Collin f0f1f6c723 xz: Don't fail if stdin doesn't support O_NONBLOCK.
It's a problem at least on OpenBSD which doesn't support
O_NONBLOCK on e.g. /dev/null. I'm not surprised if it's
a problem on other OSes too since this behavior is allowed
in POSIX-1.2008.

The code relying on this behavior was committed in June 2013
and included in 5.1.3alpha released on 2013-10-26. Clearly
the development releases only get limited testing.
2015-01-07 19:08:06 +02:00
Lasse Collin 6060f7dc76 Bump version and soname for 5.2.0.
I know that soname != app version, but I skip AGE=1
in -version-info to make the soname match the liblzma
version anyway. It doesn't hurt anything as long as
it doesn't conflict with library versioning rules.
2014-12-21 18:11:17 +02:00
Lasse Collin 2cb82ff21c Fix build when --disable-threads is used. 2014-12-21 18:00:38 +02:00
Lasse Collin 42e97a3264 xz: Fix a comment. 2014-12-21 14:07:54 +02:00
Lasse Collin 7f7d093de7 xz: Update the man page about --threads. 2014-12-16 21:00:09 +02:00
Lasse Collin 009823448b xz: Update the man page about --block-size. 2014-12-16 20:57:43 +02:00
Lasse Collin 1190c641af liblzma: Document how lzma_mt.block_size affects memory usage. 2014-12-02 20:04:07 +02:00
Lasse Collin 34f9e40a0a Remove LZMA_UNSTABLE macro. 2014-11-26 20:12:27 +02:00
Lasse Collin 6d9c0ce9f2 liblzma: Update lzma_stream_encoder_mt() API docs. 2014-11-26 20:10:33 +02:00
Lasse Collin 2301f3f05d liblzma: Verify the filter chain in threaded encoder initialization.
This way an invalid filter chain is detected at the Stream
encoder initialization instead of delaying it to the first
call to lzma_code() which triggers the initialization of
the actual filter encoder(s).
2014-11-25 12:32:05 +02:00
Lasse Collin 7b03a15cea xzdiff: Use mkdir if mktemp isn't available. 2014-11-10 18:54:40 +02:00
Lasse Collin f8c13e5e36 xzdiff: Create a temporary directory to hold a temporary file.
This avoids the possibility of "File name too long" when
creating a temp file when the input file name is very long.

This also means that other users on the system can no longer
see the input file names in /tmp (or whatever $TMPDIR is)
since the temporary directory will have a generic name. This
usually doesn't matter since on many systems one can see
the arguments given to all processes anyway.

The number X chars to mktemp where increased from 6 to 10.

Note that with some shells temp files or dirs won't be used at all.
2014-11-10 18:45:01 +02:00
Lasse Collin 7716dcf9df liblzma: Fix lzma_mt.preset in lzma_stream_encoder_mt_memusage().
It read the filter chain from a wrong variable. This is a similar
bug that was fixed in 9494fb6d0f.
2014-11-10 15:38:47 +02:00
Lasse Collin c923b140b2 Build: Prepare to support Automake's subdir-objects.
Due to a bug in Automake, subdir-objects won't be enabled
for now.

http://debbugs.gnu.org/cgi/bugreport.cgi?bug=17354

Thanks to Daniel Richard G. for the original patches.
2014-10-29 21:15:35 +02:00
Lasse Collin 076258cc45 Add support for AmigaOS/AROS to tuklib_physmem().
Thanks to Fredrik Wikstrom.
2014-10-09 19:41:51 +03:00
Lasse Collin efa7b0a210 xzgrep: Avoid passing both -q and -l to grep.
The behavior of grep -ql varies:
  - GNU grep behaves like grep -q.
  - OpenBSD grep behaves like grep -l.

POSIX doesn't make it 100 % clear what behavior is expected.
Anyway, using both -q and -l at the same time makes no sense
so both options simply should never be used at the same time.

Thanks to Christian Weisgerber.
2014-10-09 18:42:14 +03:00
Lasse Collin d62028b4c1 liblzma: Fix a portability problem in Makefile.am.
POSIX supports $< only in inference rules (suffix rules).
Using it elsewhere is a GNU make extension and doesn't
work e.g. with OpenBSD make.

Thanks to Christian Weisgerber for the patch.
2014-09-20 19:42:56 +03:00
Lasse Collin c35de31d42 Bump the version number to 5.1.4beta. 2014-09-14 21:54:09 +03:00
Lasse Collin 6b5e3b9eff xz: Add --ignore-check. 2014-08-05 22:32:36 +03:00
Lasse Collin 9adbc2ff37 liblzma: Add support for LZMA_IGNORE_CHECK. 2014-08-05 22:15:07 +03:00
Lasse Collin 0e0f34b8e4 liblzma: Add support for lzma_block.ignore_check.
Note that this slightly changes how lzma_block_header_decode()
has been documented. Earlier it said that the .version is set
to the lowest required value, but now it says that the .version
field is kept unchanged if possible. In practice this doesn't
affect any old code, because before this commit the only
possible .version was 0.
2014-08-05 22:03:30 +03:00
Lasse Collin 71e1437ab5 liblzma: Use lzma_memcmplen() in the BT3 match finder.
I had missed this when writing the commit
5db75054e9.

Thanks to Jun I Jin.
2014-08-04 19:25:58 +03:00
Lasse Collin 5dcffdbcc2 liblzma: SHA-256: Optimize the Maj macro slightly.
The Maj macro is used where multiple things are added
together, so making Maj a sum of two expressions allows
some extra freedom for the compiler to schedule the
instructions.

I learned this trick from
<http://www.hackersdelight.org/corres.txt>.
2014-08-03 21:32:25 +03:00
Lasse Collin a9477d1e0c liblzma: SHA-256: Optimize the way rotations are done.
This looks weird because the rotations become sequential,
but it helps quite a bit on both 32-bit and 64-bit x86:

  - It requires fewer instructions on two-operand
    instruction sets like x86.

  - It requires one register less which matters especially
    on 32-bit x86.

I hope this doesn't hurt other archs.

I didn't invent this idea myself, but I don't remember where
I saw it first.
2014-08-03 21:08:12 +03:00
Lasse Collin 5a76c7c8ee liblzma: SHA-256: Remove the GCC #pragma that became unneeded.
The unrolling in the previous commit should avoid the
situation where a compiler may think that an uninitialized
variable might be accessed.
2014-08-03 20:38:13 +03:00
Lasse Collin 9a096f8e57 liblzma: SHA-256: Unroll a little more.
This way a branch isn't needed for each operation
to choose between blk0 and blk2, and still the code
doesn't grow as much as it would with full unrolling.
2014-08-03 20:33:38 +03:00
Lasse Collin bc7650d87b liblzma: SHA-256: Do the byteswapping without a temporary buffer. 2014-08-03 19:56:43 +03:00
Lasse Collin 544aaa3d13 liblzma: Use lzma_memcmplen() in normal mode of LZMA.
Two locations were not changed yet because the simplest change
assumes that the initial "len" may be greater than "limit".
2014-07-25 22:38:28 +03:00
Lasse Collin f48fce093b liblzma: Simplify LZMA fast mode code by using memcmp(). 2014-07-25 22:30:38 +03:00
Lasse Collin 6bf5308e34 liblzma: Use lzma_memcmplen() in fast mode of LZMA. 2014-07-25 22:29:49 +03:00
Lasse Collin 5db75054e9 liblzma: Use lzma_memcmplen() in the match finders.
This doesn't change the match finder output.
2014-07-25 21:15:07 +03:00
Lasse Collin e1c8f1d01f liblzma: Add lzma_memcmplen() for fast memory comparison.
This commit just adds the function. Its uses will be in
separate commits.

This hasn't been tested much yet and it's perhaps a bit early
to commit it but if there are bugs they should get found quite
quickly.

Thanks to Jun I Jin from Intel for help and for pointing out
that string comparison needs to be optimized in liblzma.
2014-07-25 20:57:20 +03:00
Lasse Collin 17215f751c xz: Update the help message of a few options.
Updated: --threads, --block-size, and --block-list
Added: --flush-timeout
2014-06-29 20:54:14 +03:00
Lasse Collin 96864a6ddf xz: Use lzma_cputhreads() instead of own copy of tuklib_cpucores(). 2014-06-18 22:07:06 +03:00
Lasse Collin a115cc3748 liblzma: Add lzma_cputhreads(). 2014-06-18 22:04:24 +03:00
Lasse Collin 3ce3e79769 xz: Check for filter chain compatibility for --flush-timeout.
This avoids LZMA_PROG_ERROR from lzma_code() with filter chains
that don't support LZMA_SYNC_FLUSH.
2014-06-18 19:11:52 +03:00
Lasse Collin ceca379017 xzgrep: exit 0 when at least one file matches.
Mimic the original grep behavior and return exit_success when
at least one xz compressed file matches given pattern.

Original bugreport:
https://bugzilla.redhat.com/show_bug.cgi?id=1108085

Thanks to Pavel Raiskup for the patch.
2014-06-11 20:43:28 +03:00
Lasse Collin 8c19216bac xz: Force single-threaded mode when --flush-timeout is used. 2014-06-09 21:21:24 +03:00
Lasse Collin da1718f266 liblzma: Use lzma_alloc_zero() in LZ encoder initialization.
This avoids a memzero() call for a newly-allocated memory,
which can be expensive when encoding small streams with
an over-sized dictionary.

To avoid using lzma_alloc_zero() for memory that doesn't
need to be zeroed, lzma_mf.son is now allocated separately,
which requires handling it separately in normalize() too.

Thanks to Vincenzo Innocente for reporting the problem.
2014-05-25 21:45:56 +03:00
Lasse Collin 28af24e9cf liblzma: Add the internal function lzma_alloc_zero(). 2014-05-25 19:25:57 +03:00
Lasse Collin ed9ac85822 xz: Fix uint64_t vs. size_t which broke 32-bit build.
Thanks to Christian Hesse.
2014-05-08 18:03:09 +03:00
Lasse Collin 4d5b7b3fda liblzma: Rename the private API header lzma/lzma.h to lzma/lzma12.h.
It can be confusing that two header files have the same name.
The public API file is still lzma.h.
2014-05-04 11:07:17 +03:00
Lasse Collin 1555a9c566 Build: Fix the combination of --disable-xzdec --enable-lzmadec.
In this case "make install" could fail if the man page directory
didn't already exist at the destination. If it did exist, a
dangling symlink was created there. Now the link is omitted
instead. This isn't the best fix but it's better than the old
behavior.
2014-04-25 17:53:42 +03:00
Lasse Collin 54df428799 xz: Rename a variable to avoid a namespace collision on Solaris.
I don't know the details but I have an impression that there's
no problem in practice if using GCC since people have built xz
with GCC (without patching xz), but renaming the variable cannot
hurt either.

Thanks to Mark Ashley.
2014-04-09 17:26:10 +03:00
Lasse Collin 9494fb6d0f liblzma: Fix lzma_mt.preset not working with lzma_stream_encoder_mt().
It read the filter chain from a wrong variable.
2014-01-29 20:13:51 +02:00
Lasse Collin 673a4cb53d liblzma: Fix typo in a comment. 2014-01-20 11:20:40 +02:00
Lasse Collin 3d5c090872 xz: Fix a comment. 2014-01-12 17:41:14 +02:00
Lasse Collin 69fd4e1c93 Windows: Add MSVC defines for inline and restrict keywords. 2014-01-12 17:04:33 +02:00
Lasse Collin a19d9e8575 liblzma: Avoid C99 compound literal arrays.
MSVC 2013 doesn't like them. Maybe they aren't so good
for readability either since many aren't used to them.
2014-01-12 16:44:52 +02:00
Lasse Collin e28528f1c8 liblzma: Remove a useless C99ism from sha256.c.
Unsurprisingly it makes no difference in compiled output.
2014-01-12 12:50:30 +02:00
Lasse Collin 5ad1effc45 xz: Fix use of wrong variable.
Since the only call to suffix_set() uses optarg
as the argument, fixing this bug doesn't change
the behavior of the program.
2014-01-12 12:17:08 +02:00
Lasse Collin 3e62c68d75 Fix typos in comments. 2014-01-12 12:11:36 +02:00
Lasse Collin b22e94d8d1 liblzma: Document the need for block->check for lzma_block_header_decode().
Thanks to Tomer Chachamu.
2013-11-26 18:20:09 +02:00
Lasse Collin d1cd8b1cb8 xz: Update the man page about --block-size and --block-list. 2013-11-12 16:38:57 +02:00
Lasse Collin dd750acbe2 xz: Make --block-list and --block-size work together in single-threaded.
Previously, --block-list and --block-size only worked together
in threaded mode. Boundaries are specified by --block-list, but
--block-size specifies the maximum size for a Block. Now this
works in single-threaded mode too.

Thanks to James M Leddy for the original patch.
2013-11-12 16:29:48 +02:00
Lasse Collin ae222fe980 Bump the version number to 5.1.3alpha. 2013-10-26 13:26:14 +03:00
Lasse Collin 841da0352d xz: Document behavior of --block-list with threads.
This needs to be updated before 5.2.0.
2013-10-25 22:41:28 +03:00
Lasse Collin 56feb8665b xz: Document --flush-timeout=TIMEOUT on the man page. 2013-10-22 20:03:12 +03:00
Lasse Collin ba413da1d5 xz: Take advantage of LZMA_FULL_BARRIER with --block-list.
Now if --block-list is used in threaded mode, the encoder
won't need to flush at each Block boundary specified via
--block-list. This improves performance a lot, making
threading helpful with --block-list.

The flush timer was reset after LZMA_FULL_FLUSH but since
LZMA_FULL_BARRIER doesn't flush, resetting the timer is
no longer done.
2013-10-22 19:51:55 +03:00
Lasse Collin 0cd45fc2bc liblzma: Support LZMA_FULL_FLUSH and _BARRIER in threaded encoder.
Now --block-list=SIZES works with in the threaded mode too,
although the performance is still bad due to the use of
LZMA_FULL_FLUSH instead of the new LZMA_FULL_BARRIER.
2013-10-02 20:05:23 +03:00
Lasse Collin 97bb38712f liblzma: Add LZMA_FULL_BARRIER support to single-threaded encoder.
In the single-threaded encoder LZMA_FULL_BARRIER is simply
an alias for LZMA_FULL_FLUSH.
2013-10-02 12:55:11 +03:00
Lasse Collin fef0c6b410 liblzma: Add block_buffer_encoder.h into Makefile.inc.
This should have been in b465da5988.
2013-09-17 11:57:51 +03:00
Lasse Collin 8083e03291 xz: Add a missing test for TUKLIB_DOSLIKE. 2013-09-17 11:55:38 +03:00
Lasse Collin 6b44b4a775 Add native threading support on Windows.
Now liblzma only uses "mythread" functions and types
which are defined in mythread.h matching the desired
threading method.

Before Windows Vista, there is no direct equivalent to
pthread condition variables. Since this package doesn't
use pthread_cond_broadcast(), pre-Vista threading can
still be kept quite simple. The pre-Vista code doesn't
use anything that wasn't already available in Windows 95,
so the binaries should run even on Windows 95 if someone
happens to care.
2013-09-17 11:52:28 +03:00
Lasse Collin 72975df6c8 Build: Create liblzma.pc in a src/liblzma/Makefile.am.
Previously it was done in configure, but doing that goes
against the Autoconf manual. Autoconf requires that it is
possible to override e.g. prefix after running configure
and that doesn't work correctly if liblzma.pc is created
by configure.

A potential downside of this change is that now e.g.
libdir in liblzma.pc is a standalone string instead of
being defined via ${prefix}, so if one overrides prefix
when running pkg-config the libdir won't get the new value.
I don't know if this matters in practice.

Thanks to Vincent Torri.
2013-09-09 20:37:03 +03:00
Lasse Collin 1c2b6e7e83 Fix the previous commit which broke the build.
Apparently I didn't even compile-test the previous commit.

Thanks to Christian Hesse.
2013-08-04 15:24:09 +03:00
Lasse Collin 124eb69c78 Windows: Add Windows support to tuklib_cpucores().
It is used for Cygwin too. I'm not sure if that is
a good or bad idea.

Thanks to Vincent Torri.
2013-08-03 13:52:58 +03:00
Lasse Collin dee6ad3d59 xz: Add preliminary support for --flush-timeout=TIMEOUT.
When --flush-timeout=TIMEOUT is used, xz will use
LZMA_SYNC_FLUSH if read() would block and at least
TIMEOUT milliseconds has elapsed since the previous flush.

This can be useful in realtime-like use cases where the
data is simultanously decompressed by another process
(possibly on a different computer). If new uncompressed
input data is produced slowly, without this option xz could
buffer the data for a long time until it would become
decompressible from the output.

If TIMEOUT is 0, the feature is disabled. This is the default.

This commit affects the compression side. Using xz for
the decompression side for the above purpose doesn't work
yet so well because there is quite a bit of input and
output buffering when decompressing.

The --long-help or man page were not updated yet.
The details of this feature may change.
2013-07-04 14:18:46 +03:00
Lasse Collin fa381acaf9 xz: Don't set src_eof=true after an I/O error because it's useless. 2013-07-04 13:41:03 +03:00
Lasse Collin ea00545bea xz: Fix the test when to read more input.
Testing for end of file was no longer correct after full flushing
became possible with --block-size=SIZE and --block-list=SIZES.
There was no bug in practice though because xz just made a few
unneeded zero-byte reads.
2013-07-04 13:25:11 +03:00
Lasse Collin 736903c64b xz: Move some of the timing code into mytime.[hc].
This switches units from microseconds to milliseconds.

New clock_gettime(CLOCK_MONOTONIC) will be used if available.
There is still a fallback to gettimeofday().
2013-07-04 12:51:57 +03:00
Lasse Collin c0627b3fce xz: Silence a warning seen with _FORTIFY_SOURCE=2.
Thanks to Christian Hesse.
2013-07-01 14:34:11 +03:00
Lasse Collin a37ae8b5eb Man pages: Use similar syntax for synopsis as in xz.
The man pages of lzmainfo, xzmore, and xzdec had similar
constructs as the man page of xz had before the commit
eb6ca9854b. Eric S. Raymond
didn't mention these man pages in his bug report, but
it's nice to be consistent.
2013-06-30 18:02:27 +03:00
Lasse Collin cdba9ddd87 xz: Use non-blocking I/O for the output file.
Now both reading and writing should be without
race conditions with signals.

They might still be signal handling issues left.
Signals are blocked during many operations to avoid
EINTR but it may cause problems e.g. if writing to
stderr blocks when trying to display an error message.
2013-06-29 15:59:13 +03:00
Lasse Collin e61a5c95da xz: Fix return value type in io_write_buf().
It didn't affect the behavior of the code since -1
becomes true anyway.
2013-06-28 23:56:17 +03:00
Lasse Collin 9dc319eabb xz: Use the self-pipe trick to avoid a race condition with signals.
It is possible that a signal to set user_abort arrives right
before a blocking system call is made. In this case the call
may block until another signal arrives, while the wanted
behavior is to make xz clean up and exit as soon as possible.

After this commit, the race condition is avoided with the
input side which already uses non-blocking I/O. The output
side still uses blocking I/O and thus has the race condition.
2013-06-28 23:48:05 +03:00
Lasse Collin 3541bc79d0 xz: Use non-blocking I/O for the input file. 2013-06-28 22:51:02 +03:00
Lasse Collin 78673a08be xz: Remove an outdated NetBSD-specific comment.
Nowadays errno == EFTYPE is documented in open(2).
2013-06-28 18:46:13 +03:00
Lasse Collin a616fdad34 xz: Fix error detection of fcntl(fd, F_SETFL, flags) calls.
POSIX says that fcntl(fd, F_SETFL, flags) returns -1 on
error and "other than -1" on success. This is how it is
documented e.g. on OpenBSD too. On Linux, success with
F_SETFL is always 0 (at least accorinding to fcntl(2)
from man-pages 3.51).
2013-06-28 18:09:47 +03:00
Lasse Collin 4a08a6e4c6 xz: Fix use of wrong variable in a fcntl() call.
Due to a wrong variable name, when writing a sparse file
to standard output, *all* file status flags were cleared
(to the extent the operating system allowed it) instead of
only clearing the O_APPEND flag. In practice this worked
fine in the common situations on GNU/Linux, but I didn't
check how it behaved elsewhere.

The original flags were still restored correctly. I still
changed the code to use a separate boolean variable to
indicate when the flags should be restored instead of
relying on a special value in stdout_flags.
2013-06-28 17:36:47 +03:00
Lasse Collin b790b435da xz: Fix assertion related to posix_fadvise().
Input file can be a FIFO or something else that doesn't
support posix_fadvise() so don't check the return value
even with an assertion. Nothing bad happens if the call
to posix_fadvise() fails.
2013-06-28 14:55:37 +03:00
Lasse Collin 84d2da6c9d xz: Check the value of lzma_stream_flags.version in --list.
It is a no-op for now, but if an old xz version is used
together with a newer liblzma that supports something new,
then this check becomes important and will stop the old xz
from trying to parse files that it won't understand.
2013-06-26 13:30:57 +03:00
Lasse Collin 46540e4c10 liblzma: Avoid a warning about a shadowed variable.
On Mac OS X wait() is declared in <sys/wait.h> that
we include one way or other so don't use "wait" as
a variable name.

Thanks to Christian Kujau.
2013-06-23 18:57:23 +03:00
Lasse Collin ebb501ec73 xz: Validate Uncompressed Size from Block Header in list.c.
This affects only "xz -lvv". Normal decompression with xz
already detected if Block Header and Index had mismatched
Uncompressed Size fields. So this just makes "xz -lvv"
show such files as corrupt instead of showing the
Uncompressed Size from Index.
2013-06-23 17:36:47 +03:00
Lasse Collin eb6ca9854b xz: Make the man page more friendly to doclifter.
Thanks to Eric S. Raymond.
2013-06-21 22:04:45 +03:00
Lasse Collin 0c0a1947e6 xz: A couple of man page fixes.
Now the interaction of presets and custom filter chains
is described correctly. Earlier it contradicted itself.

Thanks to DevHC who reported these issues on IRC to me
on 2012-12-14.
2013-06-21 21:54:59 +03:00
Lasse Collin 2fcda89939 xz: Fix interaction between preset and custom filter chains.
There was somewhat illogical behavior when --extreme was
specified and mixed with custom filter chains.

Before this commit, "xz -9 --lzma2 -e" was equivalent
to "xz --lzma2". After it is equivalent to "xz -6e"
(all earlier preset options get forgotten when a custom
filter chain is specified and the default preset is 6
to which -e is applied). I find this less illogical.

This also affects the meaning of "xz -9e --lzma2 -7".
Earlier it was equivalent to "xz -7e" (the -e specified
before a custom filter chain wasn't forgotten). Now it
is "xz -7". Note that "xz -7e" still is the same as "xz -e7".

Hopefully very few cared about this in the first place,
so pretty much no one should even notice this change.

Thanks to Conley Moorhous.
2013-06-21 21:50:26 +03:00
Lasse Collin 8957c58609 xzdec: Improve the --help message.
The options are now ordered in the same order as in xz's help
message.

Descriptions were added to the options that are ignored.
I left them in parenthesis even if it looks a bit weird
because I find it easier to spot the ignored vs. non-ignored
options from the list that way.
2013-04-15 19:29:09 +03:00
Jeff Bastian 5019413a05 xzgrep: make the '-h' option to be --no-filename equivalent
* src/scripts/xzgrep.in: Accept the '-h' option in argument parsing.
2013-04-05 19:14:50 +03:00