1
0
mirror of https://git.tukaani.org/xz.git synced 2025-04-15 04:00:50 +00:00

2868 Commits

Author SHA1 Message Date
Lasse Collin
a522a22654
Bump version and soname for 5.8.1 v5.8.1 2025-04-03 14:34:43 +03:00
Lasse Collin
1c462c2ad8
Add NEWS for 5.8.1 2025-04-03 14:34:43 +03:00
Lasse Collin
513cabcf7f
Tests: Call lzma_code() in smaller chunks in fuzz_common.h
This makes it easy to crash fuzz_decode_stream_mt when tested
against the code from 5.8.0.

Obviously this might make it harder to reach some other code path now.
The previous code has been in use since 2018 when fuzzing was added
in 106d1a663d4b ("Tests: Add a fuzz test program and a config file
for OSS-Fuzz.").
2025-04-03 14:34:43 +03:00
Lasse Collin
48440e24a2
Tests: Add a fuzzing target for the multithreaded .xz decoder
It doesn't seem possible to trigger the CVE-2025-31115 bug with this
fuzzing target at the moment. It's because the code in fuzz_common.h
passes the whole input buffer to lzma_code() at once.
2025-04-03 14:34:43 +03:00
Lasse Collin
0c80045ab8
liblzma: mt dec: Fix lack of parallelization in single-shot decoding
Single-shot decoding means calling lzma_code() by giving it the whole
input at once and enough output buffer space to store the uncompressed
data, and combining this with LZMA_FINISH and no timeout
(lzma_mt.timeout = 0). This way the file is decoded with a single
lzma_code() call if possible.

The bug prevented the decoder from starting more than one worker thread
in single-shot mode. The issue was noticed when reviewing the code;
there are no bug reports. Thus maybe few have tried this mode.

Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")
2025-04-03 14:34:42 +03:00
Lasse Collin
8188048854
liblzma: mt dec: Don't modify thr->in_size in the worker thread
Don't set thr->in_size = 0 when returning the thread to the stack of
available threads. Not only is it useless, but the main thread may
read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made
no difference if the main thread saw the original value or 0. With
invalid inputs (when worker thread stops early), thr->in_size was
no longer modified after the previous commit with the security fix
("Don't free the input buffer too early").

So while the bug appears harmless now, it's important to fix it because
the variable was being modified without proper locking. It's trivial
to fix because there is no need to change the value. Only main thread
needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new
Block before the worker thread is activated.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
d5a2ffe41b
liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
The input buffer must be valid as long as the main thread is writing
to the worker-specific input buffer. Fix it by making the worker
thread not free the buffer on errors and not return the worker thread to
the pool. The input buffer will be freed when threads_end() is called.

With invalid input, the bug could at least result in a crash. The
effects include heap use after free and writing to an address based
on the null pointer plus an offset.

The bug has been there since the first committed version of the threaded
decoder and thus affects versions from 5.3.3alpha to 5.8.0.

As the commit message in 4cce3e27f529 says, I had made significant
changes on top of Sebastian's patch. This bug was indeed introduced
by my changes; it wasn't in Sebastian's version.

Thanks to Harri K. Koskinen for discovering and reporting this issue.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reported-by: Harri K. Koskinen <x64nop@nannu.org>
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
c0c835964d
liblzma: mt dec: Simplify by removing the THR_STOP state
The main thread can directly set THR_IDLE in threads_stop() which is
called when errors are detected. threads_stop() won't return the stopped
threads to the pool or free the memory pointed by thr->in anymore, but
it doesn't matter because the existing workers won't be reused after
an error. The resources will be cleaned up when threads_end() is
called (reinitializing the decoder always calls threads_end()).

Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
831b55b971
liblzma: mt dec: Fix a comment
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
2025-04-03 14:34:42 +03:00
Lasse Collin
b9d168eee4
liblzma: Add assertions to lzma_bufcpy() 2025-04-03 14:34:30 +03:00
Lasse Collin
c8e0a4897b
DOS: Update Makefile to fix the build 2025-04-02 16:54:40 +03:00
Lasse Collin
307c02ed69
sysdefs.h: Avoid <stdalign.h> even with C11 compilers
Oracle Developer Studio 12.6 on Solaris 10 claims C11 support in
__STDC_VERSION__ and supports _Alignas. However, <stdalign.h> is missing.
We only need alignas, so define it to _Alignas with C11/C17 compilers.
If something included <stdalign.h> later, it shouldn't cause problems.

Thanks to Ihsan Dogan for reporting the issue and testing the fix.

Fixes: c0e7eaae8d6eef1e313c9d0da20ccf126ec61f38
2025-03-29 12:41:32 +02:00
Lasse Collin
7ce38b3183
Update THANKS 2025-03-29 12:32:05 +02:00
Lasse Collin
688e51bde4
Translations: Update the Croatian translation 2025-03-29 12:21:51 +02:00
Lasse Collin
173fb5c68b
doc/SHA256SUMS: Add 5.8.0 2025-03-25 18:23:57 +02:00
Lasse Collin
db9258e828
Bump version and soname for 5.8.0
Also remove the LZMA_UNSTABLE macro.
v5.8.0
2025-03-25 15:18:32 +02:00
Lasse Collin
bfb752a38f
Add NEWS for 5.8.0 2025-03-25 15:18:32 +02:00
Lasse Collin
6ccbb904da
Translations: Run "make -C po update-po"
POT-Creation-Date is set to match the timestamp in 5.7.2beta which
in the Translation Project is known as 5.8.0-pre1. The strings
haven't changed since 5.7.1alpha but a few comments have.

This is a very noisy commit, but this helps keeping the PO files
similar between the Git repository and stable release tarballs.
2025-03-25 15:18:31 +02:00
Lasse Collin
891a5f057a
Translations: Run po4a/update-po
Also remove the trivial obsolete messages like man page dates.

This is a noisy commit, but this helps keeping the PO files similar
between the Git repository and stable release tarballs.
2025-03-25 15:18:31 +02:00
Lasse Collin
4f52e73870
Translations: Partially fix overtranslation in Serbian man pages
Names of environment variables and some other strings must be present
in the original form. The translator couldn't be reached so I'm
changing some of the strings myself. In the "Robot mode" section,
occurrences in the middle of sentences weren't changed to reduce
the chance of grammar breakage, but I kept the translated strings in
parenthesis in the headings. It's not ideal, but now people shouldn't
need to look at the English man page to find the English strings.
2025-03-25 15:18:31 +02:00
Lasse Collin
ff5d944749
liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage 2025-03-25 15:18:31 +02:00
Lasse Collin
943b012d09
liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()
SSE2 is supported on every x86-64 processor. The SSE2 code is used on
32-bit x86 if compiler options permit unconditional use of SSE2.

dict_repeat() copies short random-sized unaligned buffers. At least
on glibc, FreeBSD, and Windows (MSYS2, UCRT, MSVCRT), memcpy() is
clearly faster than byte-by-byte copying in this use case. Compared
to the memcpy() version, the new SSE2 version reduces decompression
time by 0-5 % depending on the machine and libc. It should never be
slower than the memcpy() version.

However, on musl 1.2.5 on x86-64, the memcpy() version is the slowest.
Compared to the memcpy() version:

  - The byte-by-version takes 6-7 % less time to decompress.
  - The SSE2 version takes 16-18 % less time to decompress.

The numbers are from decompressing a Linux kernel source tarball in
single-threaded mode on older AMD and Intel systems. The tarball
compresses well, and thus dict_repeat() performance matters more
than with some other files.
2025-03-25 15:18:31 +02:00
Lasse Collin
bc14e4c94e
liblzma: Add "restrict" to a few functions in lz_decoder.h
This doesn't make any difference in practice because compilers can
already see that writing through the dict->buf pointer cannot modify
the contents of *dict itself: The LZMA decoder makes a local copy of
the lzma_dict structure, and even if it didn't, the pointer to
lzma_dict in the LZMA decoder is already "restrict".

It's nice to add "restrict" anyway. uint8_t is typically unsigned char
which can alias anything. Without the above conditions or "restrict",
compilers could need to assume that writing through dict->buf might
modify *dict. This would matter in dict_repeat() because the loops
refer to dict->buf and dict->pos instead of making local copies of
those members for the duration of the loops. If compilers had to
assume that writing through dict->buf can affect *dict, then compilers
would need to emit code that reloads dict->buf and dict->pos after
every write through dict->buf.
2025-03-25 15:18:31 +02:00
Lasse Collin
e82ee090c5
liblzma: Define LZ_DICT_INIT_POS for initial dictionary position
It's more readable.
2025-03-25 15:18:30 +02:00
Lasse Collin
8e7cd0091e
Windows: Update README-Windows.txt about UCRT 2025-03-25 15:18:30 +02:00
Lasse Collin
2c24292d34
Update THANKS 2025-03-25 15:18:15 +02:00
Lasse Collin
48053c9089
Translations: Update the Italian translation 2025-03-17 15:33:25 +02:00
Lasse Collin
8d6f06a65f
Translations: Update the Portuguese translation
The language tag in the Translation Project is pt, not pt_PT,
thus I changed the "Language:" line to pt.
2025-03-17 15:28:56 +02:00
Lasse Collin
c3439b039f
Translations: Update the Italian translation 2025-03-14 13:13:32 +02:00
Lasse Collin
79b4ab8d79
Translations: Update the Italian man page translations
Only trivial additions but this keeps the file in sync with the TP.
2025-03-12 20:48:39 +02:00
Lasse Collin
515b6fc855
Translations: Update the Italian man page translations 2025-03-12 19:38:54 +02:00
Lasse Collin
333b7c0b77
Translations: Update the Korean man page translations 2025-03-10 21:00:31 +02:00
Lasse Collin
ae52ebd27d
Translations: Update the German man page translations 2025-03-10 20:56:57 +02:00
Lasse Collin
1028e52c93
CMake: Fix tuklib_use_system_extensions
Revert back to a macro so that list(APPEND CMAKE_REQUIRED_DEFINITIONS)
will affect the calling scope. I had forgotten that while CMake
functions inherit the variables from the parent scope, the changes
to them are local unless using set(... PARENT_SCOPE).

This also means that the commit message in 5bb77d0920dc is wrong. The
commit itself is still fine, making it clearer that -DHAVE_SYS_PARAM_H
is only needed for specific check_c_source_compiles() calls.

Fixes: c1ea7bd0b60eed6ebcdf9a713ca69034f6f07179
2025-03-10 13:41:50 +02:00
Lasse Collin
80e4883602
INSTALL: Document -bmaxdata on AIX
This is based on a pull request and AIX docs. I haven't tested the
instructions myself.

Closes: https://github.com/tukaani-project/xz/pull/137
2025-03-10 13:41:49 +02:00
Lasse Collin
ab319186b6
Update THANKS 2025-03-10 11:37:19 +02:00
Collin Funk
4434671a04
tuklib_physmem: Silence -Wsign-conversion on AIX
Closes: https://github.com/tukaani-project/xz/pull/168
2025-03-10 11:36:44 +02:00
Lasse Collin
18bcaa4faf
Translations: Update the Romanian man page translations 2025-03-09 22:11:35 +02:00
Lasse Collin
1e17b7f42f
Translations: Update the Croatian translation 2025-03-09 22:11:35 +02:00
Lasse Collin
ff85e6130d
Translations: Update the Romanian translation 2025-03-09 22:11:34 +02:00
Lasse Collin
a5bfb33f30
Translations: Update the Ukrainian man page translations 2025-03-09 22:11:34 +02:00
Lasse Collin
5bb77d0920
CMake: Use cmake_push_check_state in tuklib_cpucores and tuklib_physmem
Now the changes to CMAKE_REQUIRED_DEFINITIONS are temporary and don't
leak to the calling code.
2025-03-09 17:44:37 +02:00
Lasse Collin
c1ea7bd0b6
CMake: Revise tuklib_use_system_extensions
Define NetBSD and Darwin/macOS feature test macros. Autoconf defines
these too (and a few others).

Define the macros on Windows except with MSVC. The _GNU_SOURCE macro
makes a difference with mingw-w64.

Use a function instead of a macro. Don't take the TARGET_OR_ALL argument
because there's always global effect because the global variable
CMAKE_REQUIRED_DEFINITIONS is modified.
2025-03-09 17:44:31 +02:00
Lasse Collin
4243c45a48
doc/SHA256SUMS: Add 5.7.2beta 2025-03-08 14:54:29 +02:00
Lasse Collin
cc7f2fc1cf
Bump version and soname for 5.7.2beta v5.7.2beta 2025-03-08 14:38:56 +02:00
Lasse Collin
62e44b3616
Add NEWS for 5.7.2beta 2025-03-08 14:25:17 +02:00
Lasse Collin
70f1f20378
COPYING: Remove the note about old releases 2025-03-08 14:25:17 +02:00
Lasse Collin
db9827dc38
xz: Update the man page about the environment variables again 2025-03-08 14:25:16 +02:00
Lasse Collin
99c584891b
liblzma: Edit spelling in a comment
It was found with codespell.
2025-03-06 19:37:03 +02:00
Lasse Collin
7a234c8c05
xz: Update the man page about the environment variables 2025-03-06 19:37:03 +02:00