1
0
mirror of https://git.tukaani.org/xz.git synced 2025-04-16 20:50:52 +00:00

2631 Commits

Author SHA1 Message Date
Lasse Collin
2e918d09ad
liblzma: mt dec: Fix lack of parallelization in single-shot decoding
Single-shot decoding means calling lzma_code() by giving it the whole
input at once and enough output buffer space to store the uncompressed
data, and combining this with LZMA_FINISH and no timeout
(lzma_mt.timeout = 0). This way the file is decoded with a single
lzma_code() call if possible.

The bug prevented the decoder from starting more than one worker thread
in single-shot mode. The issue was noticed when reviewing the code;
there are no bug reports. Thus maybe few have tried this mode.

Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")
(cherry picked from commit 0c80045ab82c406858d9d5bcea9f48ebc3d0a81d)
2025-04-04 14:53:00 +03:00
Lasse Collin
6ff5b8c559
liblzma: mt dec: Don't modify thr->in_size in the worker thread
Don't set thr->in_size = 0 when returning the thread to the stack of
available threads. Not only is it useless, but the main thread may
read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made
no difference if the main thread saw the original value or 0. With
invalid inputs (when worker thread stops early), thr->in_size was
no longer modified after the previous commit with the security fix
("Don't free the input buffer too early").

So while the bug appears harmless now, it's important to fix it because
the variable was being modified without proper locking. It's trivial
to fix because there is no need to change the value. Only main thread
needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new
Block before the worker thread is activated.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
(cherry picked from commit 8188048854e8d11071b8a50d093c74f4c030acc9)
2025-04-03 15:49:42 +03:00
Lasse Collin
1b874b4f04
liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
The input buffer must be valid as long as the main thread is writing
to the worker-specific input buffer. Fix it by making the worker
thread not free the buffer on errors and not return the worker thread to
the pool. The input buffer will be freed when threads_end() is called.

With invalid input, the bug could at least result in a crash. The
effects include heap use after free and writing to an address based
on the null pointer plus an offset.

The bug has been there since the first committed version of the threaded
decoder and thus affects versions from 5.3.3alpha to 5.8.0.

As the commit message in 4cce3e27f529 says, I had made significant
changes on top of Sebastian's patch. This bug was indeed introduced
by my changes; it wasn't in Sebastian's version.

Thanks to Harri K. Koskinen for discovering and reporting this issue.

Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.")
Reported-by: Harri K. Koskinen <x64nop@nannu.org>
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
(cherry picked from commit d5a2ffe41bb77b918a8c96084885d4dbe4bf6480)
2025-04-03 15:49:41 +03:00
Lasse Collin
f74cf18ad0
liblzma: mt dec: Simplify by removing the THR_STOP state
The main thread can directly set THR_IDLE in threads_stop() which is
called when errors are detected. threads_stop() won't return the stopped
threads to the pool or free the memory pointed by thr->in anymore, but
it doesn't matter because the existing workers won't be reused after
an error. The resources will be cleaned up when threads_end() is
called (reinitializing the decoder always calls threads_end()).

Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
(cherry picked from commit c0c835964dfaeb2513a3c0bdb642105152fe9f34)
2025-04-03 15:49:41 +03:00
Lasse Collin
c1a91b8bae
liblzma: mt dec: Fix a comment
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
(cherry picked from commit 831b55b971cf579ee16a854f177c36b20d3c6999)
2025-04-03 15:49:41 +03:00
Lasse Collin
fb1210f215
liblzma: Add assertions to lzma_bufcpy()
(cherry picked from commit b9d168eee4fb6393b4fe207c0aeb5faee316ca1a)
2025-04-03 15:49:41 +03:00
Lasse Collin
ac50df0d89
Bump version and soname for 5.6.4 v5.6.4 2025-01-23 11:45:07 +02:00
Lasse Collin
83ce1d42ae
Add NEWS for 5.6.4 2025-01-23 11:42:12 +02:00
Lasse Collin
608dec5bc6
NEWS: The security fix in 5.6.3 is known as CVE-2024-47611
(cherry picked from commit b3af3297e4d6cf0eafb48155aa97bb06c82a9228)
2025-01-23 11:41:25 +02:00
Lasse Collin
9295008837
Translations: Run po4a/update-po 2025-01-23 11:24:43 +02:00
Lasse Collin
990c769a5e
windows/build.bash: Fix error message
Fixes: 1ee716f74085223c8fbcae1d5a384e6bf53c0f6a
(cherry picked from commit a04b9dd0c7c74fabd8c393d2dc68a221276d6e29)
2025-01-22 16:57:05 +02:00
Lasse Collin
5ae7958dbc
Windows: Disable MinGW-w64's stdio functions in size-optimized builds
This only affects builds with UCRT. With legacy MSVCRT, the replacement
functions are always enabled.

Omitting the MinGW-w64 replacements saves over 20 KiB per executable.
The downside is that --enable-small or XZ_SMALL=ON disables thousand
separator support in xz messages. If someone is OK with the slower
speed of slightly smaller builds, lack of thousand separators won't
matter.

Don't override __USE_MINGW_ANSI_STDIO if it is already defined (via
CPPFLAGS or such method).

(cherry picked from commit 4eae859ae8ad7072eaa74aeaee79a2c3c12c55cb)
2025-01-22 15:39:57 +02:00
Lasse Collin
c182b9c1b3
Update THANKS
(cherry picked from commit da359c360e986b21cd8d7b888c6a80f56b9d49c7)
2025-01-19 20:12:21 +02:00
Lasse Collin
82df651858
Update THANKS
(cherry picked from commit f032373561cefaf07f92ffe3fbc471ec6770456e)
2025-01-19 19:46:49 +02:00
Lasse Collin
717bee1ec5
Build: Use --sort=name in TAR_OPTIONS
Use also LC_COLLATE=C to make the sorting locale-independent.
Sorting makes the file order reproducible.

(cherry picked from commit 950da11ce09c90412dcbca29689575037640667a)
2025-01-17 16:54:00 +02:00
Lasse Collin
27a503b8dd
Update THANKS
(cherry picked from commit 96336b0110d47756a9fd2a103fbf0a99e905fbed)
2025-01-12 13:11:53 +02:00
Lasse Collin
f4d988bc04
liblzma: Fix the encoder breakage on big endian ARM64
When the 8-byte method was enabled for ARM64, a check for endianness
wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it.

Fixes: cd64dd70d5665b6048829c45772d08606f44672e
Co-authored-by: Marcus Comstedt <marcus@mc.pp.se>
(cherry picked from commit 150356207c8d6a3e0af465b676430d19d62f884c)
2025-01-12 13:11:07 +02:00
Lasse Collin
e22c4fb259
Windows: Update manifest comments about long UTF-8 filenames
(cherry picked from commit b01b0958025a2da284b53a583f313f8140636cb5)
2025-01-12 13:11:06 +02:00
Lasse Collin
5e77f8a9ef
Windows: Update build.bash and its README-Windows.txt to UCRT
While MSVCRT builds are possible, UCRT works better with UTF-8.
A 32-bit build is included still but hopefully it's not actually
needed anymore.

(cherry picked from commit 0dfc67d37ebb038be8a9b17b536d1b561d52e81a)
2025-01-12 13:11:06 +02:00
Lasse Collin
2133ff9839
Translations: Update Serbian translation
I rewrapped a few overlong lines. Those edits aren't in the
Translation Project. Automatic wrapping in the master branch
means that these strings need to be updated soon anyway.
2025-01-10 13:18:26 +02:00
Lasse Collin
8eaf2cb58e
liblzma: Always validate the first digit of a preset string
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.

However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:               ^
    xz: Unsupported preset

After the fix:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:              ^
    xz: Unsupported preset

The ^ now correctly points to the X and not past it because the X itself
is the problematic character.

Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203
(cherry picked from commit 75107217670a97b7b772833669d88c3c2f188e37)
2025-01-05 12:58:46 +02:00
Lasse Collin
9b5a5c39f4
xz: Fix getopt_long argument type in --filters*
Forgetting the argument (or not using = to separate the option from
the argument) resulted in lzma_str_to_filters() being called with NULL
as input string argument. The function handles it fine but xz passes
the NULL to printf() too:

    $ xz --filters
    xz: Error in --filters=FILTERS option:
    xz: (null)
    xz: ^
    xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters()

Now it's correct:

    $ xz --filters
    xz: option '--filters' requires an argument

The --filters-help option doesn't take any arguments.

Fixes: 9ded880a0221f4d1256845fc4ab957ffd377c760
Fixes: d6af7f347077b22403133239592e478931307759
Fixes: a165d7df1964121eb9df715e6f836a31c865beef
(cherry picked from commit 52ff32433734d03befd85a5bf00fba77d6501455)
2025-01-05 11:42:02 +02:00
Lasse Collin
4ddf32c92b
xzdec: Don't leave Landlock file descriptor open for no reason
This fix is similar to 48ff3f06521ca326996ab9a04d1b342098960427.

Fixes: d74fb5f060b76db709b50f5fd37490394e52f975
(cherry picked from commit 2655c81b5e92278b0fd51f6537c1116f8349b02a)
2025-01-04 20:09:42 +02:00
Lasse Collin
1a5b93ed57
liblzma: Silence warnings from "clang -Wimplicit-fallthrough"
(cherry picked from commit 672da29bb3a209a727ae46c0df948d7eea69f2e2)
2025-01-02 15:48:57 +02:00
Lasse Collin
33899ee86d
xzdec: Fix language in a comment
(cherry picked from commit e34dbd6a0ae7a560a5508d51fc0bd142c5a320dc)
2025-01-02 15:48:31 +02:00
Lasse Collin
5a208b0c92
Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1
Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.

(cherry picked from commit 16821252c504071f5c2012e415e59cbf5fb79820)
2025-01-02 15:42:13 +02:00
Lasse Collin
b8081fdbc5
Build: Use git log --pretty=medium when creating ChangeLog
It's the default in git-log. Specifying it explicitly is good in case
a user has set format.pretty to a different value.

(cherry picked from commit ea21c76aa2406ba06ac154fe57741734c04f260f)
2024-12-30 11:22:47 +02:00
Lasse Collin
27c63200ee
Windows: Update MinGW-w64 + CMake instructions to recommend UCRT
(cherry picked from commit 08050c0788ce5bac0ffd572e9784a2749c4a13df)
2024-12-30 10:52:44 +02:00
Lasse Collin
89db6aacbf
xz man page: Describe the source file deletion in -z and -d options
The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.

Fixes: https://github.com/tukaani-project/xz/issues/150
(cherry picked from commit 653732bd6f06d8f465bf353bf6e1c16f1405b906)
2024-12-30 10:52:44 +02:00
Lasse Collin
3324ea3576
xz: Fix comments
(cherry picked from commit 260d5d36203955a7148ae1ab05d0931c942028d5)
2024-12-27 09:15:33 +02:00
Dexter Castor Döpping
50b8d61030
CMake: Disable unity builds project-wide
liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.

This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.

Closes: https://github.com/tukaani-project/xz/pull/158
(cherry picked from commit bf6da9a573a780cd1a7fb1728ef55d09e58dad11)
2024-12-22 20:10:41 +02:00
Lasse Collin
8a7d922fb8
Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale()
See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
(cherry picked from commit b40e3321a7fb9dfdf8ffb30e7e0788c2f0abc941)
2024-12-20 16:35:13 +02:00
Lasse Collin
dcdd40cacc
Revert "Windows: Use UTF-8 locale when active code page is UTF-8"
This reverts commit 0d0b574cc45045d6150d397776340c068df59e2a.

(cherry picked from commit bc4165da92b56668ddd1b7014b3488a0fad1733a)
2024-12-20 16:35:13 +02:00
Lasse Collin
f8e42ed44d
xzdec: Use setlocale() instead of tuklib_gettext_setlocale()
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.

In the commit 78868b6e, the explanation in the commit message is wrong.

Fixes: 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db
(cherry picked from commit d6796f9ce5359faaaed82926c1735aee3694430f)
2024-12-20 16:35:13 +02:00
Lasse Collin
3ed40b9f87
Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.

Fixes: 20dfca8171dad4c64785ac61d5b68972c444877b
(cherry picked from commit e607329a615759f1519016595dd38df7c89208f2)
2024-12-20 16:35:12 +02:00
Lasse Collin
4e0ebbabe4
tuklib_mbstr_width: Change the behavior when wcwidth() is not available
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.

In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
(cherry picked from commit b797c44c42ea54fe1c52722a2fca0c9618575598)
2024-12-18 19:22:01 +02:00
Lasse Collin
4ff609adb0
xzdec: Use setlocale() via tuklib_gettext_setlocale()
xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:

  - Without any setlocale() call, non-ASCII filenames don't display
    properly in Command Prompt unless one first uses "chcp 65001"
    to set the console code page to UTF-8.

  - setlocale(LC_ALL, "") is enough to make non-ASCII filenames
    print correctly in Command Prompt without using "chcp 65001",
    assuming that the non-UTF-8 code page (like 850) supports
    those non-ASCII characters.

  - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
    such functions use an UTF-8 locale instead of a legacy code page.
    The tuklib_gettext_setlocale() macro takes care of this (without
    enabling any translations).

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
(cherry picked from commit 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db)
2024-12-18 19:22:00 +02:00
Lasse Collin
4e7a48bf15
Windows: Use UTF-8 locale when active code page is UTF-8
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
(cherry picked from commit 0d0b574cc45045d6150d397776340c068df59e2a)
2024-12-18 19:22:00 +02:00
Lasse Collin
d20e4115e1
Windows: Document the need for setlocale(LC_ALL, ".UTF8")
Also warn about unpaired surrogates and (somewhat UTF-8-specific)
MAX_PATH issue in FindFirstFileA().

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
(cherry picked from commit 20dfca8171dad4c64785ac61d5b68972c444877b)
2024-12-18 19:22:00 +02:00
Lasse Collin
f9f0cdae8a
xzdec: Call tuklib_progname_init() early enough
If the early pledge() call on OpenBSD fails, it calls my_errorf()
which requires the "progname" variable.

Fixes: d74fb5f060b76db709b50f5fd37490394e52f975
(cherry picked from commit 4e936f234056e5831013ed922145b666b04bb1e3)
2024-12-18 19:22:00 +02:00
Lasse Collin
3e0bc4e91f
CMake: Bump maximum policy version to 3.31
With CMake 3.31, there were a few warnings from
CMP0177 "install() DESTINATION paths are normalized".
These occurred because the install(FILES) command in
my_install_man_lang() is called with a DESTINATION path
that contains two consecutive slashes, for example,
"share/man//man1". Such a path is for the English man pages.
With translated man pages, the language code goes between
the slashes. The warning was probably triggered because the
extra slash gets removed by the normalization.

(cherry picked from commit 61feaf681bd793dc5c919732b44bca7dcf2ed1b8)
2024-12-18 19:22:00 +02:00
Lasse Collin
55127b25f2
Update THANKS
(cherry picked from commit b0bb84dd7bbdcc85243386a0051c7b2cb5fc6a18)
2024-12-18 19:22:00 +02:00
Dexter Castor Döpping
d86fa15b72
liblzma: Fix incorrect macro name in a comment
Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559
Closes: https://github.com/tukaani-project/xz/pull/155
(cherry picked from commit bee0c044d30a6ad3b3d94901c27e7519f6f46e27)
2024-12-18 19:22:00 +02:00
Lasse Collin
86e8b03d20
Translations: Update the Chinese (traditional) translation
(cherry picked from commit b36177273602ebc83e9cc58517f63a7b6af33f70)
2024-12-18 19:21:59 +02:00
Lasse Collin
9c5bab8bd1
Update THANKS
(cherry picked from commit 9f69e71e78621fd056f5eaaad7cdcd9279310fb5)
2024-12-18 19:21:59 +02:00
Mark Wielaard
d9c2e7572b
xz: Landlock: Fix a file descriptor leak
(cherry picked from commit 48ff3f06521ca326996ab9a04d1b342098960427)
2024-12-18 19:21:59 +02:00
Sam James
77cab41f32
CI: update FreeBSD, NetBSD, OpenBSD, Solaris actions
Checked the changes and they're all innocuous. This should hopefully
fix the "externally managed" pip error in these jobs that started
recently.

(cherry picked from commit dbca3d078ec581600600abebbb18769d3d713914)
2024-12-18 19:21:59 +02:00
Lasse Collin
6084b25c29
cmake/tuklib_large_file_support.cmake: Add a missing include
v5.2 didn't build with CMake. Other branches had
include(CMakePushCheckState) in top-level CMakeLists.txt
which made the build work.

Fixes: 597f49b61475438a43a417236989b2acc968a686
(cherry picked from commit be4bf94446b6286a5dffdde85fc1d21448f4edff)
2024-10-01 19:14:30 +03:00
Lasse Collin
9331ce4009
Bump version and soname for 5.6.3 v5.6.3 2024-10-01 12:50:28 +03:00
Lasse Collin
f52857ffde
Add NEWS for 5.6.3 2024-10-01 12:50:22 +03:00