Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
The MB output can overflow with huge numbers. Most likely these are
invalid .lzma files anyway, but let's avoid garbage output.
lzmadec was adapted from LZMA Utils. The original code with this bug
was written in 2005, over 19 years ago.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/144
"xzdec -M123" exited with exit status 1 without printing
any messages. The "M:" entry should have been removed when
the memory usage limiter support was removed from xzdec.
Fixes: 792331bdee
Closes: https://github.com/tukaani-project/xz/pull/143
[ Lasse: Commit message edits ]
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:
lzma_decoder.c: Assembler messages:
lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'
Change "movzw" to "movzwl" for compatibility.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
This reverts commit dc03f6290f.
OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.
Thanks to Christian Weisgerber.
It won't be implemented. find + xargs is more flexible, for example,
it allows compressing small files in parallel. An example for that
has been included in the xz man page since 2010.
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.
Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/86
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
This simplifies things a little. Building liblzma with VS2013 probably
still worked but building the command line tools was not supported.
Microsoft ended support for VS2013 on 2024-04.
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Thanks to Ilya Kurdyukov.
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Thanks to Sam James for general feedback.
Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112https://github.com/tukaani-project/xz/issues/122
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.
An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.
Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
The C code is from Christian Weisgerber, I merely reordered the OSes.
Then I added the build system checks without testing them.
Also thanks to Brad Smith who submitted a similar patch on GitHub
a few hours after Christian had sent his via email.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Closes: https://github.com/tukaani-project/xz/pull/125
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
uint64_to_str(totals.files, 0));
^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
It's presumably because of older gettext missing format attributes.
This is with `gcc (GCC) 7.3.0`.
This is closer to what it was before the --filtersX support was added,
just extended to support for scaling all filter chains. The method
before this commit was an extended version of the original too but
it was done in a more complex way for no clear reason. In case of
an error, the complex version printed fewer informative messages
(a good thing) but it's not a sigificant benefit.
In the limit is too low even for single-threaded mode, the required
amount of memory is now reported like in 5.4.x instead of like in
5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It
had been a FIXME in the old code but it's not clear what message
makes the most sense.
Fixes: 5f0c5a0438
The convention is that
lzma_filter filters[LZMA_FILTERS_MAX + 1];
contains the filters of a single filter chain.
It was so here as well before the commit
d6af7f3470.
It changes "filters" to a ten-element array of filter chains.
It's clearer to call this array-of-arrays "chains".
This also renames "filter_idx" to "chain_idx" which is used
as an index as in chains[chain_idx].
opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't
defined. Thus, when *encoding*, the message about *decoder* memory
usage is possible to show only when both encoder and decoder have
been built.
Since the message is shown only at V_DEBUG, skip the memusage
calculation if verbosity level isn't high enough.
Fixes: 5f0c5a0438
This was added to xz in 02e3505991
but I forgot to do the same in xzdec.
The Landlock sandbox in xzdec could be stricter as now it's
active only for the last file being decompressed. In xz,
read-only sandbox is used for multi-file case. On the other hand,
xz doesn't go to the strictest mode when processing the last file
when more than one file was specified; xzdec does.
Clang 17 with -fsanitize=address,undefined:
src/liblzma/common/filter_common.c:366:8: runtime error:
call to function encoder_find through pointer to incorrect
function type 'const lzma_filter_coder *(*)(unsigned long)'
src/liblzma/common/filter_encoder.c:187: note:
encoder_find defined here
Use a wrapper function to get the correct type neatly.
This reduces the number of casts needed too.
This issue could be a problem with control flow integrity (CFI)
methods that check the function type on indirect function calls.
Fixes: 3b34851de1
It's undefined behavior. The result wasn't ever used as it occurred
in the last iteration of a loop.
Clang 17 with -fsanitize=address,undefined:
$ src/xz/xz --block-list=123
src/xz/args.c:164:12: runtime error: applying non-zero offset 1
to null pointer
Fixes: 88ccf47205
Co-authored-by: Sam James <sam@gentoo.org>
If the arguments to lzma_index_decoder() or lzma_index_buffer_decode()
were such that LZMA_PROG_ERROR was returned, the lzma_index **i
argument wasn't touched even though the API docs say that *i = NULL
is done if an error occurs. This obviously won't be done even now
if i == NULL but otherwise it is best to do it due to the wording
in the API docs.
In practice this matters very little: The problem can occur only
if the functions are called with invalid arguments, that is,
the calling application must already have a bug.
The __builtin_bswapXX from GCC and Clang are preferred when
they are available. This can allow compilers to emit the x86 MOVBE
instruction instead of doing a load + byteswap as two instructions
(which would happen if the byteswapping is done in inline asm).
bswap16, bswap32, and bswap64 exist in system headers on *BSDs
and Darwin. #defining bswap16 on NetBSD results in a warning about
macro redefinition. It's safest to avoid this namespace conflict
completely.
No OS supported by tuklib_integer.h uses byteswapXX names and
a web search doesn't immediately find any obvious danger of
namespace conflicts. So let's try these still-pretty-short names
for the macros.
Thanks to Sam James for pointing out the compiler warning on
NetBSD 10.0.
The API docs clearly say that if error_pos isn't NULL then *error
is always set on any error. However, it wasn't touched if str == NULL
or filters == NULL or unsupported flags were specified.
Fixes: cedeeca2ea
It is logical why it cannot know for sure that the value has
to be at most 4 if it is less than 16.
The x86 filter is based on a very old LZMA SDK version. Newer
ones have quite a different implementation for the same filter.
Thanks to Sam James.
On macOS, we get:
```
signals.c: In function 'signals_init':
signals.c:76:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
76 | sigaddset(&hooked_signals, sigs[i]);
| ^~~~~~~~~
signals.c:81:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
81 | sigaddset(&hooked_signals, message_progress_sigs[i]);
| ^~~~~~~~~
signals.c:86:9: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
86 | sigaddset(&hooked_signals, SIGTSTP);
| ^~~~~~~~~
```
We use `int` for `hooked_signals` but we can't just cast to whatever
`sigset_t` is because `sigset_t` is an opaque type. It's an unsigned int
on macOS. On macOS, `sigaddset` is implemented as a macro.
Just suppress -Wsign-conversion for `signals_init` for macOS given
there's no real nice way of fixing this.
This is *NOT* done for security reasons even though the backdoor
relied on the ifunc code. Instead, the reason is that in this
project ifunc provides little benefits but it's quite a bit of
extra code to support it. The only case where ifunc *might* matter
for performance is if the CRC functions are used directly by an
application. In normal compression use it's completely irrelevant.
While the backdoor was inactive (and thus harmless) without inserting
a small trigger code into the build system when the source package was
created, it's good to remove this anyway:
- The executable payloads were embedded as binary blobs in
the test files. This was a blatant violation of the
Debian Free Software Guidelines.
- On machines that see lots bots poking at the SSH port, the backdoor
noticeably increased CPU load, resulting in degraded user experience
and thus overwhelmingly negative user feedback.
- The maintainer who added the backdoor has disappeared.
- Backdoors are bad for security.
This reverts the following without making any other changes:
6e636819 Tests: Update two test files.
a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz.
0b4ccc91 Tests: Update RISC-V test files.
8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
82ecc538 liblzma: Fix false Valgrind error report with GCC.
cf44e4b7 Tests: Add a few test files.
3060e107 Tests: Use smaller dictionary size in RISC-V test files.
e2870db5 Tests: Add two RISC-V Filter test files.
The RISC-V test files also have real content that tests the filter
but the real content would fit into much smaller files. A generator
program would need to be available as well.
Thanks to Andres Freund for finding and reporting it and making
it public quickly so others could act without a delay.
See: https://www.openwall.com/lists/oss-security/2024/03/29/4
NVHPC compiler has several issues that make it impossible to
build liblzma:
- the compiler cannot handle unions that contain pointers that
are not the first members;
- the compiler cannot handle the assembler code in range_decoder.h
(LZMA_RANGE_DECODER_CONFIG has to be set to zero);
- the compiler fails to produce valid code for delta_decode if the
vectorization is enabled, which results in failed tests.
This introduces NVHPC-specific workarounds that address the issues.
With GCC and a certain combination of flags, Valgrind will falsely
trigger an invalid write. This appears to be due to the omission of
instructions to properly save, set up, and restore the frame pointer.
The IFUNC resolver is a leaf function since it only calls a function
that is inlined. So sometimes GCC omits the frame pointer instructions
in the resolver unless this optimization is explictly disabled.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=2267598.