1
0
mirror of https://git.tukaani.org/xz.git synced 2025-04-03 22:30:58 +00:00

15 Commits

Author SHA1 Message Date
Lasse Collin
54eaea5ea4 liblzma: x86 CLMUL CRC: Rewrite
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.

Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.

Thanks to Sam James for general feedback.

Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
2024-06-17 15:00:49 +03:00
Lasse Collin
20014c2614 liblzma: Use a single macro to select CLMUL CRC to build
This way it's clearer that two things cannot be selected
at the same time.
2024-06-16 12:59:17 +03:00
Lasse Collin
d8fb098617 liblzma: CRC32 CLMUL: Refactor the constants and simplify
By using modulus scaled constants, the final reduction can
be simplified.
2024-06-16 12:56:54 +03:00
Lasse Collin
ef652ac391 liblzma: CRC64 CLMUL: Refactor the constants
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
2024-06-16 12:56:54 +03:00
Lasse Collin
71b147aab7 liblzma: Remove CRC_USE_GENERIC_FOR_SMALL_INPUTS
It was already commented out.
2024-06-16 12:56:54 +03:00
Lasse Collin
f99a7be406 liblzma: Remove crc_attr_no_sanitize_address
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
https://github.com/tukaani-project/xz/issues/112
https://github.com/tukaani-project/xz/issues/122
2024-06-16 12:56:54 +03:00
Lasse Collin
0a32d2072c liblzma: Fix a typo in a comment
Thanks to Sam James for spotting it.

Fixes: f644473a211394447824ea00518d0a214ff3f7f2
2024-06-11 22:42:04 +03:00
Lasse Collin
50e6bff274 liblzma: Fix white space 2024-06-10 23:19:27 +03:00
Lasse Collin
689ae24273 liblzma: Remove ifunc support.
This is *NOT* done for security reasons even though the backdoor
relied on the ifunc code. Instead, the reason is that in this
project ifunc provides little benefits but it's quite a bit of
extra code to support it. The only case where ifunc *might* matter
for performance is if the CRC functions are used directly by an
application. In normal compression use it's completely irrelevant.
2024-04-09 18:22:27 +03:00
Lasse Collin
22af94128b Add SPDX license identifier into 0BSD source code files. 2024-02-14 18:31:16 +02:00
Lasse Collin
689e0228ba Change most public domain parts to 0BSD.
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt
were not touched.

COPYING.0BSD was added.
2024-02-14 18:31:12 +02:00
Lasse Collin
fbb3ce541e liblzma: CRC: Add a comment to crc_x86_clmul.h about BUILDING_ macros. 2024-01-11 15:25:00 +02:00
Lasse Collin
4f518c1b6b liblzma: CRC: Remove crc_always_inline, use lzma_always_inline instead.
Now crc_simd_body() in crc_x86_clmul.h is only called once
in a translation unit, we no longer need to be so cautious
about ensuring the always-inline behavior.
2024-01-11 15:24:35 +02:00
Lasse Collin
66f080e801 liblzma: Rename arch-specific CRC functions and macros.
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL.
CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used.
Currently the x86 CLMUL implementations are the only arch-optimized
versions, and these also use the CRC_x86_CLMUL macro to tell when
crc_x86_clmul.h needs to be included.

is_clmul_supported() was renamed to is_arch_extension_supported().
crc32_clmul() and crc64_clmul() were renamed to
crc32_arch_optimized() and crc64_arch_optimized().
This way the names make sense with arch-specific non-CLMUL
implementations as well.
2024-01-11 14:29:42 +02:00
Lasse Collin
419f55f9df liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:

  - On platforms that support ifunc and -fvisibility=hidden, this
    was silly because CLMUL-only build would have that single extra
    jump instruction of extra overhead.

  - On platforms that support neither -fvisibility=hidden nor linker
    version script (liblzma*.map), jumping to lzma_crcxx_clmul()
    would go via PLT so a few more instructions of overhead (still
    not a big issue but silly nevertheless).

There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).

Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.

Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().

Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.

is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
2024-01-11 14:29:42 +02:00