1
0
mirror of https://git.tukaani.org/xz.git synced 2025-10-12 20:28:18 +00:00

756 Commits

Author SHA1 Message Date
Lasse Collin
61b114e92f
liblzma: Document that lzma_allocator.free(opaque, NULL) is possible
It feels better to fix the docs than change the code because this
way newly-written applications will be forced to be compatible with
the lzma_allocator behavior of old liblzma versions. It can matter
if someone builds the application against an older liblzma version.

Fixes: https://github.com/tukaani-project/xz/issues/183
2025-09-29 18:37:19 +03:00
Lasse Collin
e3ba73034a
liblzma: validate_map.sh: Catch some unlikely errors 2025-09-29 17:50:45 +03:00
Lasse Collin
d660fe5d56
liblzma: Fix grammar in API docs
Fixes: a27920002dbc ("liblzma: Add generic support for input seeking (LZMA_SEEK).")
2025-05-23 12:28:17 +03:00
Lasse Collin
377be0ea7a
Build: With symbol versioning, try to pass --undefined-version to linker
Fixes: https://github.com/tukaani-project/xz/issues/180
Fixes: https://bugs.gentoo.org/956119
2025-05-21 16:07:01 +03:00
Lasse Collin
a6711d1c4a
Doxygen: Fix errors and some warnings in internal docs 2025-04-22 19:00:19 +03:00
Lasse Collin
516b90f6e1
liblzma: Update lzma_lzip_decoder() docs about trailing data
Don't say that the .lz format allows trailing data. According to the
lzip 1.25 manual, trailing data isn't part of the file format at all.
However, tools are still expected to behave as usefully as possible
when there is trailing data.

Fix the description of lzip >= 1.20 behavior when some of the first
bytes of trailing data match the magic bytes. While the lzip 1.25 manual
recommends that none of the first four bytes in trailing data should
match the magic bytes, the default behavior of lzip 1.25 treats
trailing data as a corrupt member header only if two or three bytes
match the magic bytes; one matching byte isn't enough.

Reported-by: Antonio Diaz Diaz
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00702.html
2025-04-21 12:23:37 +03:00
Lasse Collin
dd006a67e5
liblzma: Update the lzma_lzip_decoder() docs about sync flush marker 2025-04-17 18:30:26 +03:00
Lasse Collin
b5a5d9e3f7
liblzma: Disable CLMUL CRC on old MSVC targeting 32-bit x86
On GitHub runners, VS 2019 16.11 (MSVC 19.29.30158) results in
test failures. VS 2022 17.13 (MSVC 19.43.34808) works.

In xz 5.6.x there was a #pragma-based workaround for MSVC builds for
32-bit x86. Another method was thought to work with the new rewritten
CLMUL CRC. Apparently it doesn't. Keep it simple and disable CLMUL CRC
with any non-recent MSVC when building for 32-bit x86.

Fixes: 54eaea5ea49b ("liblzma: x86 CLMUL CRC: Rewrite")
Fixes: https://github.com/tukaani-project/xz/issues/171
Reported-by: Andrew Murray
2025-04-07 22:36:58 +03:00
Lasse Collin
c5fd88dfc3
liblzma: Remove MSVC hack from CLMUL CRC
It's not enough with MSVC 19.29 (VS 2019) even if the hack was also
applied to the CRC32 code. The tests crash when built for 32-bit x86.
2025-04-07 22:36:58 +03:00
Lasse Collin
a522a22654
Bump version and soname for 5.8.1 2025-04-03 14:34:43 +03:00
Lasse Collin
0c80045ab8
liblzma: mt dec: Fix lack of parallelization in single-shot decoding
Single-shot decoding means calling lzma_code() by giving it the whole
input at once and enough output buffer space to store the uncompressed
data, and combining this with LZMA_FINISH and no timeout
(lzma_mt.timeout = 0). This way the file is decoded with a single
lzma_code() call if possible.

The bug prevented the decoder from starting more than one worker thread
in single-shot mode. The issue was noticed when reviewing the code;
there are no bug reports. Thus maybe few have tried this mode.

Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")
2025-04-03 14:34:42 +03:00
Lasse Collin
8188048854
liblzma: mt dec: Don't modify thr->in_size in the worker thread
Don't set thr->in_size = 0 when returning the thread to the stack of
available threads. Not only is it useless, but the main thread may
read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made
no difference if the main thread saw the original value or 0. With
invalid inputs (when worker thread stops early), thr->in_size was
no longer modified after the previous commit with the security fix
("Don't free the input buffer too early").

So while the bug appears harmless now, it's important to fix it because
the variable was being modified without proper locking. It's trivial
to fix because there is no need to change the value. Only main thread
needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new
Block before the worker thread is activated.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
d5a2ffe41b
liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
The input buffer must be valid as long as the main thread is writing
to the worker-specific input buffer. Fix it by making the worker
thread not free the buffer on errors and not return the worker thread to
the pool. The input buffer will be freed when threads_end() is called.

With invalid input, the bug could at least result in a crash. The
effects include heap use after free and writing to an address based
on the null pointer plus an offset.

The bug has been there since the first committed version of the threaded
decoder and thus affects versions from 5.3.3alpha to 5.8.0.

As the commit message in 4cce3e27f529 says, I had made significant
changes on top of Sebastian's patch. This bug was indeed introduced
by my changes; it wasn't in Sebastian's version.

Thanks to Harri K. Koskinen for discovering and reporting this issue.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reported-by: Harri K. Koskinen <x64nop@nannu.org>
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
c0c835964d
liblzma: mt dec: Simplify by removing the THR_STOP state
The main thread can directly set THR_IDLE in threads_stop() which is
called when errors are detected. threads_stop() won't return the stopped
threads to the pool or free the memory pointed by thr->in anymore, but
it doesn't matter because the existing workers won't be reused after
an error. The resources will be cleaned up when threads_end() is
called (reinitializing the decoder always calls threads_end()).

Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
831b55b971
liblzma: mt dec: Fix a comment
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
b9d168eee4
liblzma: Add assertions to lzma_bufcpy() 2025-04-03 14:34:30 +03:00
Lasse Collin
db9258e828
Bump version and soname for 5.8.0
Also remove the LZMA_UNSTABLE macro.
2025-03-25 15:18:32 +02:00
Lasse Collin
ff5d944749
liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage 2025-03-25 15:18:31 +02:00
Lasse Collin
943b012d09
liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()
SSE2 is supported on every x86-64 processor. The SSE2 code is used on
32-bit x86 if compiler options permit unconditional use of SSE2.

dict_repeat() copies short random-sized unaligned buffers. At least
on glibc, FreeBSD, and Windows (MSYS2, UCRT, MSVCRT), memcpy() is
clearly faster than byte-by-byte copying in this use case. Compared
to the memcpy() version, the new SSE2 version reduces decompression
time by 0-5 % depending on the machine and libc. It should never be
slower than the memcpy() version.

However, on musl 1.2.5 on x86-64, the memcpy() version is the slowest.
Compared to the memcpy() version:

  - The byte-by-version takes 6-7 % less time to decompress.
  - The SSE2 version takes 16-18 % less time to decompress.

The numbers are from decompressing a Linux kernel source tarball in
single-threaded mode on older AMD and Intel systems. The tarball
compresses well, and thus dict_repeat() performance matters more
than with some other files.
2025-03-25 15:18:31 +02:00
Lasse Collin
bc14e4c94e
liblzma: Add "restrict" to a few functions in lz_decoder.h
This doesn't make any difference in practice because compilers can
already see that writing through the dict->buf pointer cannot modify
the contents of *dict itself: The LZMA decoder makes a local copy of
the lzma_dict structure, and even if it didn't, the pointer to
lzma_dict in the LZMA decoder is already "restrict".

It's nice to add "restrict" anyway. uint8_t is typically unsigned char
which can alias anything. Without the above conditions or "restrict",
compilers could need to assume that writing through dict->buf might
modify *dict. This would matter in dict_repeat() because the loops
refer to dict->buf and dict->pos instead of making local copies of
those members for the duration of the loops. If compilers had to
assume that writing through dict->buf can affect *dict, then compilers
would need to emit code that reloads dict->buf and dict->pos after
every write through dict->buf.
2025-03-25 15:18:31 +02:00
Lasse Collin
e82ee090c5
liblzma: Define LZ_DICT_INIT_POS for initial dictionary position
It's more readable.
2025-03-25 15:18:30 +02:00
Lasse Collin
cc7f2fc1cf
Bump version and soname for 5.7.2beta 2025-03-08 14:38:56 +02:00
Lasse Collin
99c584891b
liblzma: Edit spelling in a comment
It was found with codespell.
2025-03-06 19:37:03 +02:00
Lasse Collin
cdae0df31e
Bump version and soname for 5.7.1alpha 2025-01-23 11:50:47 +02:00
Lasse Collin
a831bc185b
liblzma: Add raw ARM64, RISC-V, and x86 BCJ filter APIs
Put them behind the LZMA_UNSTABLE macro for now.

These low-level special APIs might become useful in erofs-utils.
2025-01-20 16:44:27 +02:00
Lasse Collin
f2e2b267ca
liblzma: Mark string conversion messages as translatable 2025-01-20 16:31:49 +02:00
Lasse Collin
f49d7413d9
liblzma: Tweak a few error messages in lzma_str_to_filters() 2025-01-20 16:31:35 +02:00
Lasse Collin
51f038f8cb
liblzma: memcmplen.h: Use 8-byte method on 64-bit unaligned archs
Previously it was enabled only on x86-64 and ARM64 when also support
for unaligned access was detected or manually enabled at built time.

In the default build configuration, the 8-byte method is now enabled
also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was
reported that on big endian POWER9, encoding time may reduce 12-13 %.

This change only affects builds with GCC and Clang because the code
uses __builtin_ctzll or __builtin_clzll.

Thanks to Marcus Comstedt for testing on POWER9.
2025-01-13 08:44:58 +02:00
Lasse Collin
150356207c
liblzma: Fix the encoder breakage on big endian ARM64
When the 8-byte method was enabled for ARM64, a check for endianness
wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it.

Fixes: cd64dd70d5665b6048829c45772d08606f44672e
Co-authored-by: Marcus Comstedt <marcus@mc.pp.se>
2025-01-12 13:08:55 +02:00
Lasse Collin
7510721767
liblzma: Always validate the first digit of a preset string
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.

However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:               ^
    xz: Unsupported preset

After the fix:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:              ^
    xz: Unsupported preset

The ^ now correctly points to the X and not past it because the X itself
is the problematic character.

Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203
2025-01-05 12:58:22 +02:00
Lasse Collin
6f412814a8
Update AUTHORS
The contributions have been rewritten.
2025-01-04 19:57:17 +02:00
Lasse Collin
672da29bb3
liblzma: Silence warnings from "clang -Wimplicit-fallthrough" 2025-01-02 15:43:38 +02:00
Lasse Collin
94adc996e4
Replace "Fall through" comments with FALLTHROUGH 2025-01-02 15:43:37 +02:00
Lasse Collin
bb79f79b27
Build: Set libtool -version-info so that it matches with CMake
In the past, they haven't been in sync in development versions
although they (of course) have been in stable releases.
2024-12-29 10:54:45 +02:00
Dexter Castor Döpping
bee0c044d3
liblzma: Fix incorrect macro name in a comment
Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559
Closes: https://github.com/tukaani-project/xz/pull/155
2024-12-18 17:09:29 +02:00
Lasse Collin
c15115f7ed liblzma: Optimize the loop conditions in BCJ filters
Compilers cannot optimize the addition "i + 4" away since theoretically
it could overflow.
2024-11-26 19:17:42 +02:00
Lasse Collin
dad1530915 Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2
Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
2024-09-30 16:55:23 +03:00
Yifeng Li
6cd7c86078 liblzma: Fix x86-64 movzw compatibility in range_decoder.h
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:

    lzma_decoder.c: Assembler messages:
    lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
    lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
    lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
    lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'

Change "movzw" to "movzwl" for compatibility.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2

Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
2024-08-22 10:59:08 +03:00
Lasse Collin
f7103c2c2a Revert "liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD"
This reverts commit dc03f6290f5b9bd3d50c7e12e58dee870889d599.

OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.

Thanks to Christian Weisgerber.
2024-07-19 20:06:24 +03:00
Lasse Collin
7c292dd0bf liblzma: Tweak a comment 2024-07-13 22:10:37 +03:00
Xi Ruoyao
7baf6835cf liblzma: Speed up CRC32 calculation on 64-bit LoongArch
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.

Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.

Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>

Closes: https://github.com/tukaani-project/xz/pull/86
2024-07-01 17:09:57 +03:00
Lasse Collin
0ed8936685 liblzma: ARM64 CRC32: Align the buffer faster
Instead of doing it byte by byte, use the 1/2/4-byte CRC32 instructions.
2024-06-28 14:20:49 +03:00
Lasse Collin
fe77c4e130 liblzma: Tidy up crc_common.h
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.

ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.

It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
2024-06-23 23:09:14 +03:00
Lasse Collin
7484d37538 liblzma: Move lzma_crcXX_table[][] declarations to crc_common.h
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
2024-06-23 15:37:46 +03:00
Lasse Collin
85b081f5d4 liblzma: Make 32-bit x86 CRC assembly co-exist with CLMUL
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.

Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.

If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.

In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
2024-06-23 14:36:44 +03:00
Lasse Collin
6667d503b5 liblzma: CRC: Rename crcXX_generic to lzma_crcXX_generic
This prepares for the possibility that lzma_crc32_generic and
lzma_crc64_generic are extern functions.
2024-06-23 14:36:44 +03:00
Lasse Collin
30a2d5d510 liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.

Thanks to Ilya Kurdyukov.
2024-06-17 15:00:55 +03:00
Lasse Collin
54eaea5ea4 liblzma: x86 CLMUL CRC: Rewrite
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.

Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.

Thanks to Sam James for general feedback.

Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
2024-06-17 15:00:49 +03:00
Lasse Collin
20014c2614 liblzma: Use a single macro to select CLMUL CRC to build
This way it's clearer that two things cannot be selected
at the same time.
2024-06-16 12:59:17 +03:00
Lasse Collin
d8fb098617 liblzma: CRC32 CLMUL: Refactor the constants and simplify
By using modulus scaled constants, the final reduction can
be simplified.
2024-06-16 12:56:54 +03:00