It feels better to fix the docs than change the code because this
way newly-written applications will be forced to be compatible with
the lzma_allocator behavior of old liblzma versions. It can matter
if someone builds the application against an older liblzma version.
Fixes: https://github.com/tukaani-project/xz/issues/183
Don't say that the .lz format allows trailing data. According to the
lzip 1.25 manual, trailing data isn't part of the file format at all.
However, tools are still expected to behave as usefully as possible
when there is trailing data.
Fix the description of lzip >= 1.20 behavior when some of the first
bytes of trailing data match the magic bytes. While the lzip 1.25 manual
recommends that none of the first four bytes in trailing data should
match the magic bytes, the default behavior of lzip 1.25 treats
trailing data as a corrupt member header only if two or three bytes
match the magic bytes; one matching byte isn't enough.
Reported-by: Antonio Diaz Diaz
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00702.html
On GitHub runners, VS 2019 16.11 (MSVC 19.29.30158) results in
test failures. VS 2022 17.13 (MSVC 19.43.34808) works.
In xz 5.6.x there was a #pragma-based workaround for MSVC builds for
32-bit x86. Another method was thought to work with the new rewritten
CLMUL CRC. Apparently it doesn't. Keep it simple and disable CLMUL CRC
with any non-recent MSVC when building for 32-bit x86.
Fixes: 54eaea5ea49b ("liblzma: x86 CLMUL CRC: Rewrite")
Fixes: https://github.com/tukaani-project/xz/issues/171
Reported-by: Andrew Murray
Single-shot decoding means calling lzma_code() by giving it the whole
input at once and enough output buffer space to store the uncompressed
data, and combining this with LZMA_FINISH and no timeout
(lzma_mt.timeout = 0). This way the file is decoded with a single
lzma_code() call if possible.
The bug prevented the decoder from starting more than one worker thread
in single-shot mode. The issue was noticed when reviewing the code;
there are no bug reports. Thus maybe few have tried this mode.
Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")
Don't set thr->in_size = 0 when returning the thread to the stack of
available threads. Not only is it useless, but the main thread may
read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made
no difference if the main thread saw the original value or 0. With
invalid inputs (when worker thread stops early), thr->in_size was
no longer modified after the previous commit with the security fix
("Don't free the input buffer too early").
So while the bug appears harmless now, it's important to fix it because
the variable was being modified without proper locking. It's trivial
to fix because there is no need to change the value. Only main thread
needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new
Block before the worker thread is activated.
Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
The input buffer must be valid as long as the main thread is writing
to the worker-specific input buffer. Fix it by making the worker
thread not free the buffer on errors and not return the worker thread to
the pool. The input buffer will be freed when threads_end() is called.
With invalid input, the bug could at least result in a crash. The
effects include heap use after free and writing to an address based
on the null pointer plus an offset.
The bug has been there since the first committed version of the threaded
decoder and thus affects versions from 5.3.3alpha to 5.8.0.
As the commit message in 4cce3e27f529 says, I had made significant
changes on top of Sebastian's patch. This bug was indeed introduced
by my changes; it wasn't in Sebastian's version.
Thanks to Harri K. Koskinen for discovering and reporting this issue.
Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reported-by: Harri K. Koskinen <x64nop@nannu.org>
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
The main thread can directly set THR_IDLE in threads_stop() which is
called when errors are detected. threads_stop() won't return the stopped
threads to the pool or free the memory pointed by thr->in anymore, but
it doesn't matter because the existing workers won't be reused after
an error. The resources will be cleaned up when threads_end() is
called (reinitializing the decoder always calls threads_end()).
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
SSE2 is supported on every x86-64 processor. The SSE2 code is used on
32-bit x86 if compiler options permit unconditional use of SSE2.
dict_repeat() copies short random-sized unaligned buffers. At least
on glibc, FreeBSD, and Windows (MSYS2, UCRT, MSVCRT), memcpy() is
clearly faster than byte-by-byte copying in this use case. Compared
to the memcpy() version, the new SSE2 version reduces decompression
time by 0-5 % depending on the machine and libc. It should never be
slower than the memcpy() version.
However, on musl 1.2.5 on x86-64, the memcpy() version is the slowest.
Compared to the memcpy() version:
- The byte-by-version takes 6-7 % less time to decompress.
- The SSE2 version takes 16-18 % less time to decompress.
The numbers are from decompressing a Linux kernel source tarball in
single-threaded mode on older AMD and Intel systems. The tarball
compresses well, and thus dict_repeat() performance matters more
than with some other files.
This doesn't make any difference in practice because compilers can
already see that writing through the dict->buf pointer cannot modify
the contents of *dict itself: The LZMA decoder makes a local copy of
the lzma_dict structure, and even if it didn't, the pointer to
lzma_dict in the LZMA decoder is already "restrict".
It's nice to add "restrict" anyway. uint8_t is typically unsigned char
which can alias anything. Without the above conditions or "restrict",
compilers could need to assume that writing through dict->buf might
modify *dict. This would matter in dict_repeat() because the loops
refer to dict->buf and dict->pos instead of making local copies of
those members for the duration of the loops. If compilers had to
assume that writing through dict->buf can affect *dict, then compilers
would need to emit code that reloads dict->buf and dict->pos after
every write through dict->buf.
Previously it was enabled only on x86-64 and ARM64 when also support
for unaligned access was detected or manually enabled at built time.
In the default build configuration, the 8-byte method is now enabled
also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was
reported that on big endian POWER9, encoding time may reduce 12-13 %.
This change only affects builds with GCC and Clang because the code
uses __builtin_ctzll or __builtin_clzll.
Thanks to Marcus Comstedt for testing on POWER9.
When the 8-byte method was enabled for ARM64, a check for endianness
wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it.
Fixes: cd64dd70d5665b6048829c45772d08606f44672e
Co-authored-by: Marcus Comstedt <marcus@mc.pp.se>
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.
However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
After the fix:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
The ^ now correctly points to the X and not past it because the X itself
is the problematic character.
Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203
Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:
lzma_decoder.c: Assembler messages:
lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'
Change "movzw" to "movzwl" for compatibility.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
This reverts commit dc03f6290f5b9bd3d50c7e12e58dee870889d599.
OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.
Thanks to Christian Weisgerber.
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.
Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/86
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Thanks to Ilya Kurdyukov.
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Thanks to Sam James for general feedback.
Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122