1
0
mirror of https://git.tukaani.org/xz.git synced 2025-10-24 01:52:54 +00:00

685 Commits

Author SHA1 Message Date
Jia Tan
45663443eb liblzma: Fix build error if only RISC-V BCJ filter is enabled.
If any other BCJ filter was enabled for encoding or decoding, then this
was not a problem.
2024-02-13 23:33:21 +08:00
Jia Tan
adb073da76 liblzma: Fix typo discovered by codespell. 2024-02-09 23:59:54 +08:00
Jia Tan
7f68a68c19 liblzma: Update Authors list in crc32_arm64.h. 2024-02-02 01:38:51 +08:00
Jia Tan
97f9ba50b8 liblzma: Check HAVE_USABLE_CLMUL before omitting CRC32 table.
This was split from the prior commit so it could be easily applied to
the 5.4 branch.

Closes: https://github.com/tukaani-project/xz/pull/77
2024-02-01 20:09:11 +08:00
Jia Tan
ca9015f4de liblzma: Check HAVE_USABLE_CLMUL before omitting CRC64 table.
If liblzma is configured with --disable-clmul-crc
CFLAGS="-msse4.1 -mpclmul", then it will fail to compile because the
generic version must be used but the CRC tables were not included.
2024-02-01 20:09:11 +08:00
Jia Tan
2f1552a91c liblzma: Only use ifunc in crcXX_fast.c if its needed.
The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC.
With ARM64, ifunc is incompatible because it requires non-inline
function calls for runtime detection.
2024-02-01 20:09:11 +08:00
Jia Tan
1940f0ec28 liblzma: Omit CRC tables when not needed with ARM64 optimizations.
This is similar to the existing x86-64 CLMUL conditions to omit the
tables. They were slightly refactored to improve readability.
2024-02-01 20:09:11 +08:00
Jia Tan
761f5b69a4 liblzma: Rename crc32_aarch64.h to crc32_arm64.h.
Even though the proper name for the architecture is aarch64, this
project uses ARM64 throughout. So the rename is for consistency.

Additionally, crc32_arm64.h was slightly refactored for the following
changes:

   * Added MSVC, FreeBSD, and macOS support in
     is_arch_extension_supported().

   * crc32_arch_optimized() now checks the size when aligning the
     buffer.

   * crc32_arch_optimized() loop conditions were slightly modified to
     avoid both decrementing the size and incrementing the buffer
     pointer.

   * Use the intrinsic wrappers defined in <arm_acle.h> because GCC and
     Clang name them differently.

   * Minor spacing and comment changes.
2024-02-01 20:09:11 +08:00
Jia Tan
455a08609c liblzma: Refactor crc_common.h.
The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since
the ARM64 optimizations will be different between CRC32 and CRC64.

For the same reason, CRC_ARCH_OPTIMIZED is split into
CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED.

ifunc will only be used with x86-64 CLMUL because the runtime detection
methods needed with ARM64 are not compatible with ifunc.
2024-02-01 20:09:11 +08:00
Chenxi Mao
849d0f282a Speed up CRC32 calculation on ARM64
The CRC32 instructions in ARM64 can calculate the CRC32 result
for 8 bytes in a single operation, making the use of ARM64
instructions much faster compared to the general CRC32 algorithm.

Optimized CRC32 will be enabled if ARM64 has CRC extension
running on Linux.

Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
2024-01-27 21:49:26 +08:00
Jia Tan
b43c3e48bf Bump version number for 5.5.1alpha. 2024-01-26 19:05:51 +08:00
Lasse Collin
50255feeaa liblzma: RISC-V filter: Use byte-by-byte access.
Not all RISC-V processors support fast unaligned access so
it's better to read only one byte in the main loop. This can
be faster even on x86-64 when compared to reading 32 bits at
a time as half the time the address is only 16-bit aligned.

The downside is larger code size on archs that do support
fast unaligned access.
2024-01-23 23:05:47 +08:00
Jia Tan
2959dbc735 liblzma: Update string_conversion.c to support RISC-V Filter. 2024-01-23 23:05:47 +08:00
Jia Tan
440a2eccb0 liblzma: Add RISC-V BCJ filter.
The new Filter ID is 0x0B.

Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter,
the xz CLI updates, and the Autotools build system modifications.

Thanks to Igor Pavlov for his many contributions to the design of
the filter.
2024-01-23 23:05:41 +08:00
Jia Tan
6b63c4c613 liblzma: Update website URL. 2024-01-19 23:08:14 +08:00
Lasse Collin
fbb3ce541e liblzma: CRC: Add a comment to crc_x86_clmul.h about BUILDING_ macros. 2024-01-11 15:25:00 +02:00
Lasse Collin
4f518c1b6b liblzma: CRC: Remove crc_always_inline, use lzma_always_inline instead.
Now crc_simd_body() in crc_x86_clmul.h is only called once
in a translation unit, we no longer need to be so cautious
about ensuring the always-inline behavior.
2024-01-11 15:24:35 +02:00
Lasse Collin
35c03ec6bf liblzma: CRC: Update CLMUL comments to more generic wording. 2024-01-11 14:39:46 +02:00
Lasse Collin
66f080e801 liblzma: Rename arch-specific CRC functions and macros.
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL.
CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used.
Currently the x86 CLMUL implementations are the only arch-optimized
versions, and these also use the CRC_x86_CLMUL macro to tell when
crc_x86_clmul.h needs to be included.

is_clmul_supported() was renamed to is_arch_extension_supported().
crc32_clmul() and crc64_clmul() were renamed to
crc32_arch_optimized() and crc64_arch_optimized().
This way the names make sense with arch-specific non-CLMUL
implementations as well.
2024-01-11 14:29:42 +02:00
Lasse Collin
3dbed75b0b liblzma: Fix a comment in crc_common.h. 2024-01-11 14:29:42 +02:00
Lasse Collin
419f55f9df liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:

  - On platforms that support ifunc and -fvisibility=hidden, this
    was silly because CLMUL-only build would have that single extra
    jump instruction of extra overhead.

  - On platforms that support neither -fvisibility=hidden nor linker
    version script (liblzma*.map), jumping to lzma_crcxx_clmul()
    would go via PLT so a few more instructions of overhead (still
    not a big issue but silly nevertheless).

There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).

Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.

Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().

Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.

is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
2024-01-11 14:29:42 +02:00
Lasse Collin
e3833e297d liblzma: crc_clmul.c: Add crc_attr_target macro.
This reduces the number of the complex #if directives.
2024-01-11 14:29:42 +02:00
Lasse Collin
d164ac0e62 liblzma: Simplify existing cases with lzma_attr_no_sanitize_address. 2024-01-11 14:29:42 +02:00
Lasse Collin
9523c1300d liblzma: #define crc_attr_no_sanitize_address in crc_common.h. 2024-01-11 14:29:38 +02:00
Lasse Collin
93d144f093 liblzma: CRC: Add empty lines.
And remove one too.
2024-01-10 17:19:03 +02:00
Lasse Collin
0c7e854ffd liblzma: crc_clmul.c: Tidy up the location of MSVC pragma.
It makes no difference in practice.
2024-01-10 17:19:03 +02:00
Lasse Collin
cd64dd70d5 liblzma: Use 8-byte method in memcmplen.h on ARM64.
It requires fast unaligned access to 64-bit integers
and a fast instruction to count leading zeros in
a 64-bit integer (__builtin_ctzll()). This perhaps
should be enabled on some other archs too.

Thanks to Chenxi Mao for the original patch:
https://github.com/tukaani-project/xz/pull/75 (the first commit)
According to the numbers there, this may improve encoding
speed by about 3-5 %.

This enables the 8-byte method on MSVC ARM64 too which
should work but wasn't tested.
2023-12-28 17:17:39 +02:00
Lasse Collin
12c90c00f0 liblzma: Check also for __clang__ in memcmplen.h.
This change hopefully makes no practical difference as Clang
likely was detected via __GNUC__ or _MSC_VER already.
2023-12-28 17:17:39 +02:00
Jia Tan
b34b6a9912 liblzma: Initialize lzma_lz_encoder pointers with NULL.
This fixes the recent change to lzma_lz_encoder that used memzero
instead of the NULL constant. On some compilers the NULL constant
(always 0) may not equal the NULL pointer (this only needs to guarentee
to not point to valid memory address).

Later code compares the pointers to the NULL pointer so we must
initialize them with the NULL pointer instead of 0 to guarentee
code correctness.
2023-12-20 21:38:39 +08:00
Jia Tan
183a62f0b5 liblzma: Set all values in lzma_lz_encoder to NULL after allocation.
The first member of lzma_lz_encoder doesn't necessarily need to be set
to NULL since it will always be set before anything tries to use it.
However the function pointer members must be set to NULL since other
functions rely on this NULL value to determine if this behavior is
supported or not.

This fixes a somewhat serious bug, where the options_update() and
set_out_limit() function pointers are not set to NULL. This seems to
have been forgotten since these function pointers were added many years
after the original two (code() and end()).

The problem is that by not setting this to NULL we are relying on the
memory allocation to zero things out if lzma_filters_update() is called
on a LZMA1 encoder. The function pointer for set_out_limit() is less
serious because there is not an API function that could call this in an
incorrect way. set_out_limit() is only called by the MicroLZMA encoder,
which must use LZMA1 where set_out_limit() is always set. Its currently
not possible to call set_out_limit() on an LZMA2 encoder at this time.

So calling lzma_filters_update() on an LZMA1 encoder had undefined
behavior since its possible that memory could be manipulated so the
options_update member pointed to a different instruction sequence.

This is unlikely to be a bug in an existing application since it relies
on calling lzma_filters_update() on an LZMA1 encoder in the first place.
For instance, it does not affect xz because lzma_filters_update() can
only be used when encoding to the .xz format.

This is fixed by using memzero() to set all members of lzma_lz_encoder
to NULL after it is allocated. This ensures this mistake will not occur
here in the future if any additional function pointers are added.
2023-12-16 20:51:38 +08:00
Jia Tan
1a1bb381db liblzma: Tweak a comment. 2023-12-16 20:30:55 +08:00
Jia Tan
55810780e0 liblzma: Make parameter names in function definition match declaration.
lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the
parameter name instead of "filters" (used by the declaration). "filters"
is more clear since the parameter represents the list of filters passed
to the raw encoder, each of which contains filter options.
2023-12-16 20:28:21 +08:00
Jia Tan
5dad6f628a liblzma: Improve lzma encoder init function consistency.
lzma_encoder_init() did not check for NULL options, but
lzma2_encoder_init() did. This is more of a code style improvement than
anything else to help make lzma_encoder_init() and lzma2_encoder_init()
more similar.
2023-12-16 20:18:47 +08:00
Jia Tan
2ade7246e7 liblzma: Add missing comments to lz_encoder.h. 2023-11-09 01:21:53 +08:00
Lasse Collin
46007049cd liblzma: Fix compilation of fastpos_tablegen.c.
The macro lzma_attr_visibility_hidden has to be defined to make
fastpos.h usable. The visibility attribute is irrelevant to
fastpos_tablegen.c so simply #define the macro to an empty value.

fastpos_tablegen.c is never built by the included build systems
and so the problem wasn't noticed earlier. It's just a standalone
program for generating fastpos_table.c.

Fixes: https://github.com/tukaani-project/xz/pull/69
Thanks to GitHub user Jamaika1.
2023-10-31 21:41:09 +02:00
Lasse Collin
8c36ab79cb liblzma: Add a note why crc_always_inline exists for now.
Solaris Studio is a possible example (not tested) which
supports the always_inline attribute but might not get
detected by the common.h #ifdefs.
2023-10-30 18:44:32 +02:00
Lasse Collin
e7a86b94cd liblzma: Use lzma_always_inline in memcmplen.h. 2023-10-30 18:44:32 +02:00
Lasse Collin
dcfe563299 liblzma: #define lzma_always_inline in common.h. 2023-10-30 18:44:32 +02:00
Lasse Collin
41113fe30a liblzma: Use lzma_attr_visibility_hidden on private extern declarations.
These variables are internal to liblzma and not exposed in the API.
2023-10-30 18:06:25 +02:00
Lasse Collin
a2f5ca706a liblzma: #define lzma_attr_visibility_hidden in common.h.
In ELF shared libs:

-fvisibility=hidden affects definitions of symbols but not
declarations.[*] This doesn't affect direct calls to functions
inside liblzma as a linker can replace a call to lzma_foo@plt
with a call directly to lzma_foo when -fvisibility=hidden is used.

[*] It has to be like this because otherwise every installed
    header file would need to explictly set the symbol visibility
    to default.

When accessing extern variables that aren't defined in the
same translation unit, compiler assumes that the variable has
the default visibility and thus indirection is needed. Unlike
function calls, linker cannot optimize this.

Using __attribute__((__visibility__("hidden"))) with the extern
variable declarations tells the compiler that indirection isn't
needed because the definition is in the same shared library.

About 15+ years ago, someone told me that it would be good if
the CRC tables would be defined in the same translation unit
as the C code of the CRC functions. While I understood that it
could help a tiny amount, I didn't want to change the code because
a separate translation unit for the CRC tables was needed for the
x86 assembly code anyway. But when visibility attributes are
supported, simply marking the extern declaration with the
hidden attribute will get identical result. When there are only
a few affected variables, this is trivial to do. I wish I had
understood this back then already.
2023-10-30 18:03:39 +02:00
Lasse Collin
2c7ee92e44 liblzma: Refer to MinGW-w64 instead of MinGW in the API headers.
MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project
at <https://osdn.net/projects/mingw/>) has GCC 9.2.0 as the
most recent GCC package (released 2021-02-02). The project might
still be alive but majority of people have switched to MinGW-w64.
Thus it seems clearer to refer to MinGW-w64 in our API headers too.
Building with MinGW is likely to still work but I haven't tested it
in the recent years.
2023-10-26 21:46:06 +03:00
Lasse Collin
a7d1b2825c liblzma: Add Cflags.private to liblzma.pc.in for MSYS2.
It properly adds -DLZMA_API_STATIC when compiling code that
will be linked against static liblzma. Having it there on
systems other than Windows does no harm.

See: https://www.msys2.org/docs/pkgconfig/
2023-10-26 21:46:06 +03:00
Jia Tan
988e09f27b liblzma: Move is_clmul_supported() back to crc_common.h.
This partially reverts creating crc_clmul.c
(8c0f9376f58c0696d5d6719705164d35542dd891) where is_clmul_supported()
was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This
caused a problem when the function call to lzma_is_clmul_supported()
results in a call through the PLT. ifunc resolvers run very early in
the dynamic loading sequence, so the PLT may not be setup properly at
this point. Whether the PLT is used or not for
lzma_is_clmul_supported() depened upon the compiler-toolchain used and
flags.

In liblzma compiled with GCC, for instance, GCC will go through the PLT
for function calls internal to liblzma if the version scripts and
symbol visibility hiding are not used. If lazy-binding is disabled,
then it would have made any program linked with liblzma fail during
dynamic loading in the ifunc resolver.
2023-10-21 00:01:29 +08:00
Jia Tan
105c7ca90d Build: Remove check for COND_CHECK_CRC32 in check/Makefile.inc.
Currently crc32 is always enabled, so COND_CHECK_CRC32 must always be
set. Because of this, it makes the recent change to conditionally
compile check/crc_clmul.c appear wrong since that file has CLMUL
implementations for both CRC32 and CRC64.
2023-10-19 16:23:32 +08:00
Jia Tan
c60b25569d liblzma: Fix -fsanitize=address failure with crc_clmul functions.
After forcing crc_simd_body() to always be inlined it caused
-fsanitize=address to fail for lzma_crc32_clmul() and
lzma_crc64_clmul(). The __no_sanitize_address__ attribute was added
to lzma_crc32_clmul() and lzma_crc64_clmul(), but not removed from
crc_simd_body(). ASAN and inline functions behavior has changed over
the years for GCC specifically, so while strictly required we will
keep __attribute__((__no_sanitize_address__)) on crc_simd_body() in
case this becomes a requirement in the future.

Older GCC versions refuse to inline a function with ASAN if the
caller and callee do not agree on sanitization flags
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89124#c3). If the
function was forced to be inlined, it will not compile if the callee
function has __no_sanitize_address__ but the caller doesn't.
2023-10-19 01:15:20 +08:00
Jia Tan
1c8884f0af liblzma: Set the MSVC optimization fix to only cover lzma_crc64_clmul().
After testing a 32-bit Release build on MSVC, only lzma_crc64_clmul()
has the bug. crc_simd_body() and lzma_crc32_clmul() do not need the
optimizations disabled.
2023-10-18 23:54:41 +08:00
Lasse Collin
5ce0f7a48b liblzma: CRC_USE_GENERIC_FOR_SMALL_INPUTS cannot be used with ifunc. 2023-10-18 23:54:41 +08:00
Lasse Collin
2773538049 liblzma: Include common.h in crc_common.h.
crc_common.h depends on common.h. The headers include common.h except
when there is a reason to not do so.
2023-10-18 23:54:41 +08:00
Jia Tan
e13b7947b9 liblzma: Add include guards to crc_common.h. 2023-10-18 23:54:41 +08:00
Jia Tan
40abd88afc liblzma: Add the crc_always_inline macro to crc_simd_body().
Forcing this to be inline has a significant speed improvement at the
cost of a few repeated instructions. The compilers tested on did not
inline this function since it is large and is used twice in the same
translation unit.
2023-10-18 23:54:41 +08:00