Commit Graph

96 Commits

Author SHA1 Message Date
Jia Tan 455a08609c liblzma: Refactor crc_common.h.
The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since
the ARM64 optimizations will be different between CRC32 and CRC64.

For the same reason, CRC_ARCH_OPTIMIZED is split into
CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED.

ifunc will only be used with x86-64 CLMUL because the runtime detection
methods needed with ARM64 are not compatible with ifunc.
2024-02-01 20:09:11 +08:00
Chenxi Mao 849d0f282a Speed up CRC32 calculation on ARM64
The CRC32 instructions in ARM64 can calculate the CRC32 result
for 8 bytes in a single operation, making the use of ARM64
instructions much faster compared to the general CRC32 algorithm.

Optimized CRC32 will be enabled if ARM64 has CRC extension
running on Linux.

Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
2024-01-27 21:49:26 +08:00
Lasse Collin fbb3ce541e liblzma: CRC: Add a comment to crc_x86_clmul.h about BUILDING_ macros. 2024-01-11 15:25:00 +02:00
Lasse Collin 4f518c1b6b liblzma: CRC: Remove crc_always_inline, use lzma_always_inline instead.
Now crc_simd_body() in crc_x86_clmul.h is only called once
in a translation unit, we no longer need to be so cautious
about ensuring the always-inline behavior.
2024-01-11 15:24:35 +02:00
Lasse Collin 35c03ec6bf liblzma: CRC: Update CLMUL comments to more generic wording. 2024-01-11 14:39:46 +02:00
Lasse Collin 66f080e801 liblzma: Rename arch-specific CRC functions and macros.
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL.
CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used.
Currently the x86 CLMUL implementations are the only arch-optimized
versions, and these also use the CRC_x86_CLMUL macro to tell when
crc_x86_clmul.h needs to be included.

is_clmul_supported() was renamed to is_arch_extension_supported().
crc32_clmul() and crc64_clmul() were renamed to
crc32_arch_optimized() and crc64_arch_optimized().
This way the names make sense with arch-specific non-CLMUL
implementations as well.
2024-01-11 14:29:42 +02:00
Lasse Collin 3dbed75b0b liblzma: Fix a comment in crc_common.h. 2024-01-11 14:29:42 +02:00
Lasse Collin 419f55f9df liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:

  - On platforms that support ifunc and -fvisibility=hidden, this
    was silly because CLMUL-only build would have that single extra
    jump instruction of extra overhead.

  - On platforms that support neither -fvisibility=hidden nor linker
    version script (liblzma*.map), jumping to lzma_crcxx_clmul()
    would go via PLT so a few more instructions of overhead (still
    not a big issue but silly nevertheless).

There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).

Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.

Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().

Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.

is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
2024-01-11 14:29:42 +02:00
Lasse Collin e3833e297d liblzma: crc_clmul.c: Add crc_attr_target macro.
This reduces the number of the complex #if directives.
2024-01-11 14:29:42 +02:00
Lasse Collin d164ac0e62 liblzma: Simplify existing cases with lzma_attr_no_sanitize_address. 2024-01-11 14:29:42 +02:00
Lasse Collin 9523c1300d liblzma: #define crc_attr_no_sanitize_address in crc_common.h. 2024-01-11 14:29:38 +02:00
Lasse Collin 93d144f093 liblzma: CRC: Add empty lines.
And remove one too.
2024-01-10 17:19:03 +02:00
Lasse Collin 0c7e854ffd liblzma: crc_clmul.c: Tidy up the location of MSVC pragma.
It makes no difference in practice.
2024-01-10 17:19:03 +02:00
Lasse Collin 8c36ab79cb liblzma: Add a note why crc_always_inline exists for now.
Solaris Studio is a possible example (not tested) which
supports the always_inline attribute but might not get
detected by the common.h #ifdefs.
2023-10-30 18:44:32 +02:00
Lasse Collin 41113fe30a liblzma: Use lzma_attr_visibility_hidden on private extern declarations.
These variables are internal to liblzma and not exposed in the API.
2023-10-30 18:06:25 +02:00
Jia Tan 988e09f27b liblzma: Move is_clmul_supported() back to crc_common.h.
This partially reverts creating crc_clmul.c
(8c0f9376f5) where is_clmul_supported()
was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This
caused a problem when the function call to lzma_is_clmul_supported()
results in a call through the PLT. ifunc resolvers run very early in
the dynamic loading sequence, so the PLT may not be setup properly at
this point. Whether the PLT is used or not for
lzma_is_clmul_supported() depened upon the compiler-toolchain used and
flags.

In liblzma compiled with GCC, for instance, GCC will go through the PLT
for function calls internal to liblzma if the version scripts and
symbol visibility hiding are not used. If lazy-binding is disabled,
then it would have made any program linked with liblzma fail during
dynamic loading in the ifunc resolver.
2023-10-21 00:01:29 +08:00
Jia Tan 105c7ca90d Build: Remove check for COND_CHECK_CRC32 in check/Makefile.inc.
Currently crc32 is always enabled, so COND_CHECK_CRC32 must always be
set. Because of this, it makes the recent change to conditionally
compile check/crc_clmul.c appear wrong since that file has CLMUL
implementations for both CRC32 and CRC64.
2023-10-19 16:23:32 +08:00
Jia Tan c60b25569d liblzma: Fix -fsanitize=address failure with crc_clmul functions.
After forcing crc_simd_body() to always be inlined it caused
-fsanitize=address to fail for lzma_crc32_clmul() and
lzma_crc64_clmul(). The __no_sanitize_address__ attribute was added
to lzma_crc32_clmul() and lzma_crc64_clmul(), but not removed from
crc_simd_body(). ASAN and inline functions behavior has changed over
the years for GCC specifically, so while strictly required we will
keep __attribute__((__no_sanitize_address__)) on crc_simd_body() in
case this becomes a requirement in the future.

Older GCC versions refuse to inline a function with ASAN if the
caller and callee do not agree on sanitization flags
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89124#c3). If the
function was forced to be inlined, it will not compile if the callee
function has __no_sanitize_address__ but the caller doesn't.
2023-10-19 01:15:20 +08:00
Jia Tan 1c8884f0af liblzma: Set the MSVC optimization fix to only cover lzma_crc64_clmul().
After testing a 32-bit Release build on MSVC, only lzma_crc64_clmul()
has the bug. crc_simd_body() and lzma_crc32_clmul() do not need the
optimizations disabled.
2023-10-18 23:54:41 +08:00
Lasse Collin 5ce0f7a48b liblzma: CRC_USE_GENERIC_FOR_SMALL_INPUTS cannot be used with ifunc. 2023-10-18 23:54:41 +08:00
Lasse Collin 2773538049 liblzma: Include common.h in crc_common.h.
crc_common.h depends on common.h. The headers include common.h except
when there is a reason to not do so.
2023-10-18 23:54:41 +08:00
Jia Tan e13b7947b9 liblzma: Add include guards to crc_common.h. 2023-10-18 23:54:41 +08:00
Jia Tan 40abd88afc liblzma: Add the crc_always_inline macro to crc_simd_body().
Forcing this to be inline has a significant speed improvement at the
cost of a few repeated instructions. The compilers tested on did not
inline this function since it is large and is used twice in the same
translation unit.
2023-10-18 23:54:41 +08:00
Jia Tan a5966c276b liblzma: Create crc_always_inline macro.
This macro must be used instead of the inline keyword. On MSVC, it is
a replacement for __forceinline which is an MSVC specific keyword that
should not be used with inline (it will issue a warning if it is).

It does not use a build system check to determine if
__attribute__((__always_inline__)) since all compilers that can use
CLMUL extensions (except the special case for MSVC) should support this
attribute. If this assumption is incorrect then it will result in a bug
report instead of silently producing slow code.
2023-10-18 23:54:41 +08:00
Jia Tan 96b663f67c liblzma: Refactor CRC comments.
A detailed description of the three dispatch methods was added. Also,
duplicated comments now only appear in crc32_fast.c or were removed from
both crc32_fast.c and crc64_fast.c if they appeared in crc_clmul.c.
2023-10-18 23:54:41 +08:00
Jia Tan 8c0f9376f5 liblzma: Create crc_clmul.c.
Both crc32_clmul() and crc64_clmul() are now exported from
crc32_clmul.c as lzma_crc32_clmul() and lzma_crc64_clmul(). This
ensures that is_clmul_supported() (now lzma_is_clmul_supported()) is
not duplicated between crc32_fast.c and crc64_fast.c.

Also, it encapsulates the complexity of the CLMUL implementations into a
single file and reduces the complexity of crc32_fast.c and crc64_fast.c.
Before, CLMUL code was present in crc32_fast.c, crc64_fast.c, and
crc_common.h.

During the conversion, various cleanups were applied to code (thanks to
Lasse Collin) including:

- Require using semicolons with MASK_/L/H/LH macros.
- Variable typing and const handling improvements.
- Improvements to comments.
- Fixes to the pragmas used.
- Removed unneeded variables.
- Whitespace improvements.
- Fixed CRC_USE_GENERIC_FOR_SMALL_INPUTS handling.
- Silenced warnings and removed the need for some #pragmas
2023-10-18 23:54:36 +08:00
Jia Tan a3ebc2c516 liblzma: Define CRC_USE_IFUNC in crc_common.h.
When ifunc is supported, we can define a simpler macro instead of
repeating the more complex check in both crc32_fast.c and crc64_fast.c.
2023-10-18 20:41:11 +08:00
Hans Jansen f1cd9d7194 liblzma: Added crc32_clmul to crc32_fast.c. 2023-10-13 20:54:05 +08:00
Hans Jansen 93e6fb08b2 liblzma: Moved CLMUL CRC logic to crc_common.h.
crc64_fast.c was updated to use the code from crc_common.h instead.
2023-10-13 20:54:05 +08:00
Hans Jansen 233885a437 liblzma: Rename crc_macros.h to crc_common.h. 2023-10-13 20:54:05 +08:00
Lasse Collin 5a9af95f85 liblzma: Update a comment.
The C standards don't allow an empty translation unit which can be
avoided by declaring something, without exporting any symbols.

When I committed f644473a21 I had
a feeling that some specific toolchain somewhere didn't like
empty object files (assembler or maybe "ar" complained) but
I cannot find anything to confirm this now. Quite likely I
remembered nonsense. I leave this here as a note to my future self. :-)
2023-09-26 21:47:13 +03:00
Jia Tan 8ebaf3f665 liblzma: Avoid compiler warning without creating extra symbol.
When the generic fast crc64 method is used, then we omit
lzma_crc64_table[][]. Similar to
d9166b52cf, we can avoid compiler warnings
with -Wempty-translation-unit (Clang) or -pedantic (GCC) by creating a
never used typedef instead of an extra symbol.
2023-09-27 00:04:40 +08:00
Lasse Collin 4f44ef8675 liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)).
Thanks to Agostino Sarubbo.
Fixes: https://github.com/tukaani-project/xz/issues/62
2023-09-14 16:34:07 +03:00
Jia Tan 0184d344fa liblzma: Suppress -Wunused-function warning.
Clang 16.0.0 and earlier have a bug that the ifunc resolver function
triggers the -Wunused-function warning. The resolver function is static
and only "used" by the __attribute__((__ifunc()__)).

At this time, the bug is still unresolved, but has been reported:
https://github.com/llvm/llvm-project/issues/63957

This is not a problem in GCC.
2023-07-19 23:36:00 +08:00
Lasse Collin ee44863ae8 liblzma: Add ifunc implementation to crc64_fast.c.
The ifunc method avoids indirection via the function pointer
crc64_func. This works on GNU/Linux and probably on FreeBSD too.
The previous __attribute((__constructor__)) method is kept for
compatibility with ELF platforms which do support ifunc.

The ifunc method has some limitations, for example, building
liblzma with -fsanitize=address will result in segfaults.
The configure option --disable-ifunc must be used for such builds.

Thanks to Hans Jansen for the original patch.
Closes: https://github.com/tukaani-project/xz/pull/53
2023-06-27 23:55:59 +08:00
Lasse Collin b473a92891 Change a few HTTP URLs to HTTPS.
The xz man page timestamp was intentionally left unchanged.
2023-03-18 15:56:07 +02:00
Lasse Collin 718b22a6c5 liblzma: Silence a warning from MSVC.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.

Thanks to Nathan Moinvaziri for reporting this.
2023-02-16 17:59:50 +02:00
Lasse Collin 6c886cc5b3 Fix warnings from clang -Wdocumentation. 2023-01-12 03:11:40 +02:00
Lasse Collin 0b8fa310cf liblzma: CLMUL CRC64: Work around a bug in MSVC, second attempt.
This affects only 32-bit x86 builds. x86-64 is OK as is.

I still cannot easily test this myself. The reporter has tested
this and it passes the tests included in the CMake build and
performance is good: raw CRC64 is 2-3 times faster than the
C version of the slice-by-four method. (Note that liblzma doesn't
include a MSVC-compatible version of the 32-bit x86 assembly code
for the slice-by-four method.)

Thanks to Iouri Kharon for figuring out a fix, testing, and
benchmarking.
2023-01-10 22:15:55 +02:00
Lasse Collin cfabb62a48 Revert "liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022)."
This reverts commit 36edc65ab4.

It was reported that it wasn't a good enough fix and MSVC
still produced (different kind of) bad code when building
for 32-bit x86 if optimizations are enabled.

Thanks to Iouri Kharon.
2023-01-10 12:47:16 +02:00
Lasse Collin 36edc65ab4 liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022).
I haven't tested with MSVC myself and there doesn't seem to be
information about the problem online, so I'm relying on the bug report.

Thanks to Iouri Kharon for the bug report and the patch.
2023-01-09 12:22:05 +02:00
Lasse Collin f644473a21 liblzma: Add fast CRC64 for 32/64-bit x86 using SSSE3 + SSE4.1 + CLMUL.
It also works on E2K as it supports these intrinsics.

On x86-64 runtime detection is used so the code keeps working on
older processors too. A CLMUL-only build can be done by using
-msse4.1 -mpclmul in CFLAGS and this will reduce the library
size since the generic implementation and its 8 KiB lookup table
will be omitted.

On 32-bit x86 this isn't used by default for now because by default
on 32-bit x86 the separate assembly file crc64_x86.S is used.
If --disable-assembler is used then this new CLMUL code is used
the same way as on 64-bit x86. However, a CLMUL-only build
(-msse4.1 -mpclmul) won't omit the 8 KiB lookup table on
32-bit x86 due to a currently-missing check for disabled
assembler usage.

The configure.ac check should be such that the code won't be
built if something in the toolchain doesn't support it but
--disable-clmul-crc option can be used to unconditionally
disable this feature.

CLMUL speeds up decompression of files that have compressed very
well (assuming CRC64 is used as a check type). It is know that
the CLMUL code is significantly slower than the generic code for
tiny inputs (especially 1-8 bytes but up to 16 bytes). If that
is a real-world problem then there is already a commented-out
variant that uses the generic version for small inputs.

Thanks to Ilya Kurdyukov for the original patch which was
derived from a white paper from Intel [1] (published in 2009)
and public domain code from [2] (released in 2016).

[1] https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] https://github.com/rawrunprotected/crc
2022-11-14 23:05:46 +02:00
Lasse Collin eb0f1450ad liblzma: Use __attribute__((__constructor__)) if available.
This uses it for CRC table initializations when using --disable-small.
It avoids mythread_once() overhead. It also means that then
--disable-small --disable-threads is thread-safe if this attribute
is supported.
2022-11-14 16:00:52 +02:00
Lasse Collin 48dde3bab9 liblzma: Silence -Wconversion warning from crc64_fast.c. 2022-10-31 11:54:44 +02:00
Ed Maste 865e0a3689 liblzma: Use non-executable stack on FreeBSD as on Linux 2022-02-22 01:23:34 +02:00
H.J. Lu 4fd79b90c5 liblzma: Enable Intel CET in x86 CRC assembly codes
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and add _CET_ENDBR to indirect jump targets.

Tested on Intel Tiger Lake under CET enabled Linux.
2020-12-23 17:13:33 +02:00
Lasse Collin b8e12f5ab4 Typo fixes from fossies.org.
https://fossies.org/linux/misc/xz-5.2.5.tar.xz/codespell.html
2020-03-23 18:07:50 +02:00
Lasse Collin 5e78fcbf2e Rename read32ne to aligned_read32ne, and similarly for the others.
Using the aligned methods requires more care to ensure that
the address really is aligned, so it's nicer if the aligned
methods are prefixed. The next commit will remove the unaligned_
prefix from the unaligned methods which in liblzma are used in
more places than the aligned ones.
2019-12-31 00:29:48 +02:00
Lasse Collin a12b13c5f0 liblzma: Silence clang -Wmissing-variable-declarations. 2019-06-24 23:45:21 +03:00
Lasse Collin ac398c3baf liblzma: Disable external SHA-256 by default.
This is the sane thing to do. The conflict with OpenSSL
on some OSes and especially that the OS-provided versions
can be significantly slower makes it clear that it was
a mistake to have the external SHA-256 support enabled by
default.

Those who want it can now pass --enable-external-sha256 to
configure. INSTALL was updated with notes about OSes where
this can be a bad idea.

The SHA-256 detection code in configure.ac had some bugs that
could lead to a build failure in some situations. These were
fixed, although it doesn't matter that much now that the
external SHA-256 is disabled by default.

MINIX >= 3.2.0 uses NetBSD's libc and thus has SHA256_Init
in libc instead of libutil. Support for the libutil version
was removed.
2016-03-13 20:21:49 +02:00