root/xz - xz - Root on GIT

root/xz

mirror of https://git.tukaani.org/xz.git synced 2025-11-04 15:32:55 +00:00

Author	SHA1	Message	Date
Lasse Collin	0c80045ab8	liblzma: mt dec: Fix lack of parallelization in single-shot decoding Single-shot decoding means calling lzma_code() by giving it the whole input at once and enough output buffer space to store the uncompressed data, and combining this with LZMA_FINISH and no timeout (lzma_mt.timeout = 0). This way the file is decoded with a single lzma_code() call if possible. The bug prevented the decoder from starting more than one worker thread in single-shot mode. The issue was noticed when reviewing the code; there are no bug reports. Thus maybe few have tried this mode. Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")	2025-04-03 14:34:42 +03:00
Lasse Collin	8188048854	liblzma: mt dec: Don't modify thr->in_size in the worker thread Don't set thr->in_size = 0 when returning the thread to the stack of available threads. Not only is it useless, but the main thread may read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made no difference if the main thread saw the original value or 0. With invalid inputs (when worker thread stops early), thr->in_size was no longer modified after the previous commit with the security fix ("Don't free the input buffer too early"). So while the bug appears harmless now, it's important to fix it because the variable was being modified without proper locking. It's trivial to fix because there is no need to change the value. Only main thread needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new Block before the worker thread is activated. Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.") Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Thanks-to: Sam James <sam@gentoo.org>	2025-04-03 14:34:42 +03:00
Lasse Collin	d5a2ffe41b	liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115) The input buffer must be valid as long as the main thread is writing to the worker-specific input buffer. Fix it by making the worker thread not free the buffer on errors and not return the worker thread to the pool. The input buffer will be freed when threads_end() is called. With invalid input, the bug could at least result in a crash. The effects include heap use after free and writing to an address based on the null pointer plus an offset. The bug has been there since the first committed version of the threaded decoder and thus affects versions from 5.3.3alpha to 5.8.0. As the commit message in 4cce3e27f529 says, I had made significant changes on top of Sebastian's patch. This bug was indeed introduced by my changes; it wasn't in Sebastian's version. Thanks to Harri K. Koskinen for discovering and reporting this issue. Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.") Reported-by: Harri K. Koskinen <x64nop@nannu.org> Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Thanks-to: Sam James <sam@gentoo.org>	2025-04-03 14:34:42 +03:00
Lasse Collin	c0c835964d	liblzma: mt dec: Simplify by removing the THR_STOP state The main thread can directly set THR_IDLE in threads_stop() which is called when errors are detected. threads_stop() won't return the stopped threads to the pool or free the memory pointed by thr->in anymore, but it doesn't matter because the existing workers won't be reused after an error. The resources will be cleaned up when threads_end() is called (reinitializing the decoder always calls threads_end()). Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Thanks-to: Sam James <sam@gentoo.org>	2025-04-03 14:34:42 +03:00
Lasse Collin	831b55b971	liblzma: mt dec: Fix a comment Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Thanks-to: Sam James <sam@gentoo.org>	2025-04-03 14:34:42 +03:00
Lasse Collin	b9d168eee4	liblzma: Add assertions to lzma_bufcpy()	2025-04-03 14:34:30 +03:00
Lasse Collin	db9258e828	Bump version and soname for 5.8.0 Also remove the LZMA_UNSTABLE macro.	2025-03-25 15:18:32 +02:00
Lasse Collin	a831bc185b	liblzma: Add raw ARM64, RISC-V, and x86 BCJ filter APIs Put them behind the LZMA_UNSTABLE macro for now. These low-level special APIs might become useful in erofs-utils.	2025-01-20 16:44:27 +02:00
Lasse Collin	f2e2b267ca	liblzma: Mark string conversion messages as translatable	2025-01-20 16:31:49 +02:00
Lasse Collin	f49d7413d9	liblzma: Tweak a few error messages in lzma_str_to_filters()	2025-01-20 16:31:35 +02:00
Lasse Collin	51f038f8cb	liblzma: memcmplen.h: Use 8-byte method on 64-bit unaligned archs Previously it was enabled only on x86-64 and ARM64 when also support for unaligned access was detected or manually enabled at built time. In the default build configuration, the 8-byte method is now enabled also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was reported that on big endian POWER9, encoding time may reduce 12-13 %. This change only affects builds with GCC and Clang because the code uses __builtin_ctzll or __builtin_clzll. Thanks to Marcus Comstedt for testing on POWER9.	2025-01-13 08:44:58 +02:00
Lasse Collin	150356207c	liblzma: Fix the encoder breakage on big endian ARM64 When the 8-byte method was enabled for ARM64, a check for endianness wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it. Fixes: cd64dd70d5665b6048829c45772d08606f44672e Co-authored-by: Marcus Comstedt <marcus@mc.pp.se>	2025-01-12 13:08:55 +02:00
Lasse Collin	7510721767	liblzma: Always validate the first digit of a preset string lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The call from str_to_filters() detects the string type from the first character(s) and as a side-effect it validates the first digit of the preset string. So this change makes no difference there. However, the call from parse_options() doesn't pre-validate the string. parse_lzma12_preset() will return an invalid value which is passed to lzma_lzma_preset() which safely rejects it. The bug still affects the the error message: $ xz --filters=lzma2:preset=X xz: Error in --filters=FILTERS option: xz: lzma2:preset=X xz: ^ xz: Unsupported preset After the fix: $ xz --filters=lzma2:preset=X xz: Error in --filters=FILTERS option: xz: lzma2:preset=X xz: ^ xz: Unsupported preset The ^ now correctly points to the X and not past it because the X itself is the problematic character. Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203	2025-01-05 12:58:22 +02:00
Lasse Collin	94adc996e4	Replace "Fall through" comments with FALLTHROUGH	2025-01-02 15:43:37 +02:00
Lasse Collin	4e9023857d	Fix typos Thanks to xx on #tukaani.	2024-05-18 00:34:07 +03:00
Lasse Collin	278563ef8f	liblzma: Fix incorrect function type error from sanitizer Clang 17 with -fsanitize=address,undefined: src/liblzma/common/filter_common.c:366:8: runtime error: call to function encoder_find through pointer to incorrect function type 'const lzma_filter_coder ()(unsigned long)' src/liblzma/common/filter_encoder.c:187: note: encoder_find defined here Use a wrapper function to get the correct type neatly. This reduces the number of casts needed too. This issue could be a problem with control flow integrity (CFI) methods that check the function type on indirect function calls. Fixes: 3b34851de1eaf358cf9268922fa0eeed8278d680	2024-04-30 22:22:45 +03:00
Lasse Collin	71eed2520e	liblzma: index_decoder: Fix missing initializations on LZMA_PROG_ERROR If the arguments to lzma_index_decoder() or lzma_index_buffer_decode() were such that LZMA_PROG_ERROR was returned, the lzma_index *i argument wasn't touched even though the API docs say that i = NULL is done if an error occurs. This obviously won't be done even now if i == NULL but otherwise it is best to do it due to the wording in the API docs. In practice this matters very little: The problem can occur only if the functions are called with invalid arguments, that is, the calling application must already have a bug.	2024-04-27 14:33:38 +03:00
Sam James	c7ef767c49	liblzma: outqueue: add header guard Reported by github's codeql.	2024-04-25 14:04:24 +03:00
Sam James	55dcae3056	liblzma: easy_preset: add header guard Reported by github's codeql.	2024-04-25 14:04:24 +03:00
Lasse Collin	70d12dd069	liblzma: lzma_str_to_filters: Set error_pos on all errors The API docs clearly say that if error_pos isn't NULL then error is always set on any error. However, it wasn't touched if str == NULL or filters == NULL or unsupported flags were specified. Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203	2024-04-22 22:03:04 +03:00
Lasse Collin	0b99783d63	liblzma: memcmplen.h: Add a comment why subtraction is used.	2024-03-22 17:46:30 +02:00
Lasse Collin	3217b82b3e	liblzma: Minor comment edits.	2024-03-15 18:03:47 +02:00
Sergey Kosukhin	096bc0e3f8	liblzma: Fix building with NVHPC (NVIDIA HPC SDK). NVHPC compiler has several issues that make it impossible to build liblzma: - the compiler cannot handle unions that contain pointers that are not the first members; - the compiler cannot handle the assembler code in range_decoder.h (LZMA_RANGE_DECODER_CONFIG has to be set to zero); - the compiler fails to produce valid code for delta_decode if the vectorization is enabled, which results in failed tests. This introduces NVHPC-specific workarounds that address the issues.	2024-03-15 17:30:50 +02:00
Lasse Collin	22af94128b	Add SPDX license identifier into 0BSD source code files.	2024-02-14 18:31:16 +02:00
Lasse Collin	689e0228ba	Change most public domain parts to 0BSD. Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt were not touched. COPYING.0BSD was added.	2024-02-14 18:31:12 +02:00
Jia Tan	45663443eb	liblzma: Fix build error if only RISC-V BCJ filter is enabled. If any other BCJ filter was enabled for encoding or decoding, then this was not a problem.	2024-02-13 23:33:21 +08:00
Jia Tan	2959dbc735	liblzma: Update string_conversion.c to support RISC-V Filter.	2024-01-23 23:05:47 +08:00
Jia Tan	440a2eccb0	liblzma: Add RISC-V BCJ filter. The new Filter ID is 0x0B. Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter, the xz CLI updates, and the Autotools build system modifications. Thanks to Igor Pavlov for his many contributions to the design of the filter.	2024-01-23 23:05:41 +08:00
Lasse Collin	cd64dd70d5	liblzma: Use 8-byte method in memcmplen.h on ARM64. It requires fast unaligned access to 64-bit integers and a fast instruction to count leading zeros in a 64-bit integer (__builtin_ctzll()). This perhaps should be enabled on some other archs too. Thanks to Chenxi Mao for the original patch: https://github.com/tukaani-project/xz/pull/75 (the first commit) According to the numbers there, this may improve encoding speed by about 3-5 %. This enables the 8-byte method on MSVC ARM64 too which should work but wasn't tested.	2023-12-28 17:17:39 +02:00
Lasse Collin	12c90c00f0	liblzma: Check also for __clang__ in memcmplen.h. This change hopefully makes no practical difference as Clang likely was detected via __GNUC__ or _MSC_VER already.	2023-12-28 17:17:39 +02:00
Jia Tan	55810780e0	liblzma: Make parameter names in function definition match declaration. lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the parameter name instead of "filters" (used by the declaration). "filters" is more clear since the parameter represents the list of filters passed to the raw encoder, each of which contains filter options.	2023-12-16 20:28:21 +08:00
Lasse Collin	e7a86b94cd	liblzma: Use lzma_always_inline in memcmplen.h.	2023-10-30 18:44:32 +02:00
Lasse Collin	dcfe563299	liblzma: #define lzma_always_inline in common.h.	2023-10-30 18:44:32 +02:00
Lasse Collin	41113fe30a	liblzma: Use lzma_attr_visibility_hidden on private extern declarations. These variables are internal to liblzma and not exposed in the API.	2023-10-30 18:06:25 +02:00
Lasse Collin	a2f5ca706a	liblzma: #define lzma_attr_visibility_hidden in common.h. In ELF shared libs: -fvisibility=hidden affects definitions of symbols but not declarations.[] This doesn't affect direct calls to functions inside liblzma as a linker can replace a call to lzma_foo@plt with a call directly to lzma_foo when -fvisibility=hidden is used. [] It has to be like this because otherwise every installed header file would need to explictly set the symbol visibility to default. When accessing extern variables that aren't defined in the same translation unit, compiler assumes that the variable has the default visibility and thus indirection is needed. Unlike function calls, linker cannot optimize this. Using __attribute__((__visibility__("hidden"))) with the extern variable declarations tells the compiler that indirection isn't needed because the definition is in the same shared library. About 15+ years ago, someone told me that it would be good if the CRC tables would be defined in the same translation unit as the C code of the CRC functions. While I understood that it could help a tiny amount, I didn't want to change the code because a separate translation unit for the CRC tables was needed for the x86 assembly code anyway. But when visibility attributes are supported, simply marking the extern declaration with the hidden attribute will get identical result. When there are only a few affected variables, this is trivial to do. I wish I had understood this back then already.	2023-10-30 18:03:39 +02:00
Lasse Collin	ee7709bae5	liblzma: Move a few __attribute__ uses in function declarations. The API headers have many attributes but these were left as is for now.	2023-09-22 20:06:27 +03:00
Lasse Collin	18a66fbac0	Remove incorrect uses of __attribute__((__malloc__)). xrealloc() is obviously incorrect, modern GCC docs even mention realloc() as an example where this attribute cannot be used. liblzma's lzma_alloc() and lzma_alloc_zero() would be correct uses most of the time but custom allocators may use a memory pool or otherwise hold the pointer so aliasing issues could happen in theory. The xstrdup() case likely was correct but I removed it anyway. Now there are no __malloc__ attributes left in the code. The allocations aren't in hot paths so this should make no practical difference.	2023-09-22 20:06:27 +03:00
Jia Tan	721e3d9f7a	liblzma: Update assert in vli_ceil4(). The argument to vli_ceil4() should always guarantee the return value is also a valid lzma_vli. Thus the highest three valid lzma_vli values are invalid arguments. All uses of the function ensure this so the assert is updated to match this.	2023-08-28 23:05:34 +08:00
Jia Tan	ae5c07b22a	liblzma: Add overflow check for Unpadded size in lzma_index_append(). This was not a security bug since there was no path to overflow UINT64_MAX in lzma_index_append() or when it calls index_file_size(). The bug was discovered by a failing assert() in vli_ceil4() when called from index_file_size() when unpadded_sum (the sum of the compressed size of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX. Previously, the unpadded_size parameter was checked to be not greater than UNPADDED_SIZE_MAX, but no check was done once compressed_base was added. This could not have caused an integer overflow in index_file_size() when called by lzma_index_append(). The calculation for file_size breaks down into the sum of: - Compressed base from all previous Streams - 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and footer) - stream_padding (can be set by lzma_index_stream_padding()) - Compressed base from the current Stream - Unpadded size (parameter to lzma_index_append()) The sum of everything except for Unpadded size must be less than LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions that can set these values including lzma_index_stream_padding(), lzma_index_append(), and lzma_index_cat(). The maximum value for Unpadded size is enforced by lzma_index_append() to be less than or equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since LZMA_VLI_MAX is half of UINT64_MAX. Thanks to Joona Kannisto for reporting this.	2023-08-28 23:04:56 +08:00
Dimitri Papadopoulos Orfanos	42df7c7aa1	Docs: Fix typos found by codespell	2023-07-31 20:02:21 +08:00
Jia Tan	818701ba1c	liblzma: Improve comment in string_conversion.c. The comment used "flag" when referring to decoder options. Just referring to them as options is more clear and consistent.	2023-07-18 22:56:47 +08:00
Lasse Collin	97fd5cb669	liblzma: Tweak #if condition in memcmplen.h. Maybe ICC always #defines _MSC_VER on Windows but now it's very clear which code will get used.	2023-07-18 13:57:54 +03:00
Lasse Collin	40392c19f7	liblzma: Omit unnecessary parenthesis in a preprocessor directive.	2023-07-18 13:49:43 +03:00
Jia Tan	17f8844e6f	liblzma: Remove non-portable empty initializer. Commit 78704f36e74205857c898a351c757719a6c8b666 added an empty initializer {} to prevent a warning. The empty initializer is a GNU extension and results in a build failure on MSVC. The -wpedantic flag warns about empty initializers.	2023-07-08 21:24:19 +08:00
Jia Tan	78704f36e7	liblzma: Prevent uninitialzed warning in mt stream encoder. This change only impacts the compiler warning since it was impossible for the wait_abs struct in stream_encode_mt() to be used before it was initialized since mythread_condtime_set() will always be called before mythread_cond_timedwait(). Since the mythread.h code is different between the POSIX and Windows versions, this warning was only present on Windows builds. Thanks to Arthur S for reporting the warning and providing an initial patch.	2023-06-29 00:06:16 +08:00
Jia Tan	e3356a204c	liblzma: Prevent warning for MSYS2 Windows build. In lzma_memcmplen(), the <intrin.h> header file is only included if _MSC_VER and _M_X64 are both defined but _BitScanForward64() was previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but not _MSC_VER so _BitScanForward64() was used without including <intrin.h>. Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as expected.	2023-06-28 23:59:51 +08:00
Jia Tan	8f23657498	liblzma: Exports lzma_mt_block_size() as an API function. The lzma_mt_block_size() was previously just an internal function for the multithreaded .xz encoder. It is used to provide a recommended Block size for a given filter chain. This function is helpful to determine the maximum Block size for the multithreaded .xz encoder when one wants to change the filters between blocks. Then, this determined Block size can be provided to lzma_stream_encoder_mt() in the lzma_mt options parameter when intializing the coder. This requires one to know all the filter chains they are using before starting to encode (or at least the filter chain that will need the largest Block size), but that isn't a bad limitation.	2023-05-11 23:54:44 +08:00
Jia Tan	f41df2ac2f	Windows: Include <intrin.h> when needed. Legacy Windows did not need to #include <intrin.h> to use the MSVC intrinsics. Newer versions likely just issue a warning, but the MSVC documentation says to include the header file for the intrinsics we use. GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are needed in tuklib_integer.h to only include <intrin.h> when it will is actually needed.	2023-04-19 22:22:16 +08:00
Jia Tan	2a89670ab2	liblzma: Cleans up old commented out code.	2023-04-13 20:45:19 +08:00
Lasse Collin	dfe1710784	liblzma: Silence -Wsign-conversion in SSE2 code in memcmplen.h. Thanks to Christian Hesse for reporting the issue. Fixes: https://github.com/tukaani-project/xz/issues/44	2023-03-19 22:45:59 +02:00

1 2 3 4 5 ...

306 Commits