Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:
lzma_decoder.c: Assembler messages:
lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'
Change "movzw" to "movzwl" for compatibility.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
This reverts commit dc03f6290f.
OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.
Thanks to Christian Weisgerber.
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.
Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/86
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Thanks to Ilya Kurdyukov.
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Thanks to Sam James for general feedback.
Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112https://github.com/tukaani-project/xz/issues/122
The C code is from Christian Weisgerber, I merely reordered the OSes.
Then I added the build system checks without testing them.
Also thanks to Brad Smith who submitted a similar patch on GitHub
a few hours after Christian had sent his via email.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Closes: https://github.com/tukaani-project/xz/pull/125
Clang 17 with -fsanitize=address,undefined:
src/liblzma/common/filter_common.c:366:8: runtime error:
call to function encoder_find through pointer to incorrect
function type 'const lzma_filter_coder *(*)(unsigned long)'
src/liblzma/common/filter_encoder.c:187: note:
encoder_find defined here
Use a wrapper function to get the correct type neatly.
This reduces the number of casts needed too.
This issue could be a problem with control flow integrity (CFI)
methods that check the function type on indirect function calls.
Fixes: 3b34851de1
If the arguments to lzma_index_decoder() or lzma_index_buffer_decode()
were such that LZMA_PROG_ERROR was returned, the lzma_index **i
argument wasn't touched even though the API docs say that *i = NULL
is done if an error occurs. This obviously won't be done even now
if i == NULL but otherwise it is best to do it due to the wording
in the API docs.
In practice this matters very little: The problem can occur only
if the functions are called with invalid arguments, that is,
the calling application must already have a bug.
The __builtin_bswapXX from GCC and Clang are preferred when
they are available. This can allow compilers to emit the x86 MOVBE
instruction instead of doing a load + byteswap as two instructions
(which would happen if the byteswapping is done in inline asm).
bswap16, bswap32, and bswap64 exist in system headers on *BSDs
and Darwin. #defining bswap16 on NetBSD results in a warning about
macro redefinition. It's safest to avoid this namespace conflict
completely.
No OS supported by tuklib_integer.h uses byteswapXX names and
a web search doesn't immediately find any obvious danger of
namespace conflicts. So let's try these still-pretty-short names
for the macros.
Thanks to Sam James for pointing out the compiler warning on
NetBSD 10.0.
The API docs clearly say that if error_pos isn't NULL then *error
is always set on any error. However, it wasn't touched if str == NULL
or filters == NULL or unsupported flags were specified.
Fixes: cedeeca2ea
It is logical why it cannot know for sure that the value has
to be at most 4 if it is less than 16.
The x86 filter is based on a very old LZMA SDK version. Newer
ones have quite a different implementation for the same filter.
Thanks to Sam James.
This is *NOT* done for security reasons even though the backdoor
relied on the ifunc code. Instead, the reason is that in this
project ifunc provides little benefits but it's quite a bit of
extra code to support it. The only case where ifunc *might* matter
for performance is if the CRC functions are used directly by an
application. In normal compression use it's completely irrelevant.
While the backdoor was inactive (and thus harmless) without inserting
a small trigger code into the build system when the source package was
created, it's good to remove this anyway:
- The executable payloads were embedded as binary blobs in
the test files. This was a blatant violation of the
Debian Free Software Guidelines.
- On machines that see lots bots poking at the SSH port, the backdoor
noticeably increased CPU load, resulting in degraded user experience
and thus overwhelmingly negative user feedback.
- The maintainer who added the backdoor has disappeared.
- Backdoors are bad for security.
This reverts the following without making any other changes:
6e636819 Tests: Update two test files.
a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz.
0b4ccc91 Tests: Update RISC-V test files.
8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
82ecc538 liblzma: Fix false Valgrind error report with GCC.
cf44e4b7 Tests: Add a few test files.
3060e107 Tests: Use smaller dictionary size in RISC-V test files.
e2870db5 Tests: Add two RISC-V Filter test files.
The RISC-V test files also have real content that tests the filter
but the real content would fit into much smaller files. A generator
program would need to be available as well.
Thanks to Andres Freund for finding and reporting it and making
it public quickly so others could act without a delay.
See: https://www.openwall.com/lists/oss-security/2024/03/29/4
NVHPC compiler has several issues that make it impossible to
build liblzma:
- the compiler cannot handle unions that contain pointers that
are not the first members;
- the compiler cannot handle the assembler code in range_decoder.h
(LZMA_RANGE_DECODER_CONFIG has to be set to zero);
- the compiler fails to produce valid code for delta_decode if the
vectorization is enabled, which results in failed tests.
This introduces NVHPC-specific workarounds that address the issues.