Commit Graph

1360 Commits

Author SHA1 Message Date
Lasse Collin df3efc058a xz: Clarify a comment
(cherry picked from commit 605094329b)
2024-05-23 11:28:20 +03:00
Lasse Collin 4ebfe11cd3 xz: Use the info collected in parse_block_list()
This is slightly simpler and it avoids looping through
the opt_block_list array.

(cherry picked from commit 8fac2577f2)
2024-05-23 11:28:20 +03:00
Lasse Collin bfea691361 xz: Remember the filter chains and the largest Block in parse_block_list()
(cherry picked from commit 81d350dab8)
2024-05-23 11:28:20 +03:00
Lasse Collin d4e33e7392 xz: Update a comment and initialization of filters_used_mask
(cherry picked from commit 46ab56968f)
2024-05-23 11:28:20 +03:00
Lasse Collin 3c130737c9 xz: parse_block_list: Edit integer type casting
(cherry picked from commit e89293a0ba)
2024-05-23 11:28:20 +03:00
Lasse Collin 40c8513b4e xz: Make filter_memusages a local variable
(cherry picked from commit 87011e40c1)
2024-05-23 11:28:20 +03:00
Lasse Collin cacaf25aa7 xz: Remove unused code and simplify
opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't
defined. Thus, when *encoding*, the message about *decoder* memory
usage is possible to show only when both encoder and decoder have
been built.

Since the message is shown only at V_DEBUG, skip the memusage
calculation if verbosity level isn't high enough.

Fixes: 5f0c5a0438
(cherry picked from commit 347b412a93)
2024-05-23 11:28:20 +03:00
Lasse Collin 3495a6b291 xz: Fix integer type from uint64_t to uint32_t
lzma_options_lzma.dict_size is uint32_t so use it here too.

Fixes: 5f0c5a0438
(cherry picked from commit 31358c057c)
2024-05-23 11:28:20 +03:00
Lasse Collin 1b4e7dca24 xz: Edit comments and coding style
(cherry picked from commit e4780244a1)
2024-05-23 11:28:20 +03:00
Lasse Collin 18683525a7 xz: Omit an incorrect comment
It likely was a leftover from a development version of the code.

Fixes: 183819bfd9
(cherry picked from commit fe4d8b0c80)
2024-05-23 11:28:20 +03:00
Lasse Collin 005f039864 xz: Add braces to a for-statement and to an if-statement
No functional changes.

Fixes: 5f0c5a0438
Fixes: 479fd58d60
(cherry picked from commit 9bef5b8d17)
2024-05-23 11:28:20 +03:00
Lasse Collin 34be4e6aa6 liblzma: Omit an unneeded array from the x86 filter
Fixes: 6aa2a6deeb
(cherry picked from commit de06b9f0c0)
2024-05-23 11:28:20 +03:00
Lasse Collin f99e7c69ad xzdec: Support Landlock ABI version 4
This was added to xz in 02e3505991
but I forgot to do the same in xzdec.

The Landlock sandbox in xzdec could be stricter as now it's
active only for the last file being decompressed. In xz,
read-only sandbox is used for multi-file case. On the other hand,
xz doesn't go to the strictest mode when processing the last file
when more than one file was specified; xzdec does.

(cherry picked from commit 3334c71d3d)
2024-05-23 00:13:43 +03:00
Lasse Collin bfe9be7a46 liblzma: Fix incorrect function type error from sanitizer
Clang 17 with -fsanitize=address,undefined:

    src/liblzma/common/filter_common.c:366:8: runtime error:
        call to function encoder_find through pointer to incorrect
        function type 'const lzma_filter_coder *(*)(unsigned long)'
    src/liblzma/common/filter_encoder.c:187: note:
        encoder_find defined here

Use a wrapper function to get the correct type neatly.
This reduces the number of casts needed too.

This issue could be a problem with control flow integrity (CFI)
methods that check the function type on indirect function calls.

Fixes: 3b34851de1
(cherry picked from commit 278563ef8f)
2024-05-23 00:13:43 +03:00
Lasse Collin 882eadc5b8 xz: Avoid arithmetic on a null pointer
It's undefined behavior. The result wasn't ever used as it occurred
in the last iteration of a loop.

Clang 17 with -fsanitize=address,undefined:

    $ src/xz/xz --block-list=123
    src/xz/args.c:164:12: runtime error: applying non-zero offset 1
        to null pointer

Fixes: 88ccf47205
Co-authored-by: Sam James <sam@gentoo.org>
(cherry picked from commit 77c8f60547)
2024-05-23 00:13:43 +03:00
Lasse Collin 28e7d130cb Build: Add --enable-doxygen to generate and install API docs
It requires Doxygen. This option is disabled by default.

(cherry picked from commit e21efdf96f)
2024-05-23 00:13:43 +03:00
Lasse Collin bae288ea6f liblzma: index_decoder: Fix missing initializations on LZMA_PROG_ERROR
If the arguments to lzma_index_decoder() or lzma_index_buffer_decode()
were such that LZMA_PROG_ERROR was returned, the lzma_index **i
argument wasn't touched even though the API docs say that *i = NULL
is done if an error occurs. This obviously won't be done even now
if i == NULL but otherwise it is best to do it due to the wording
in the API docs.

In practice this matters very little: The problem can occur only
if the functions are called with invalid arguments, that is,
the calling application must already have a bug.

(cherry picked from commit 71eed2520e)
2024-05-22 14:32:36 +03:00
Sam James 493bc57c33 liblzma: outqueue: add header guard
Reported by github's codeql.

(cherry picked from commit c7ef767c49)
2024-05-22 14:32:36 +03:00
Sam James cede418d4f liblzma: easy_preset: add header guard
Reported by github's codeql.

(cherry picked from commit 55dcae3056)
2024-05-22 14:32:36 +03:00
Lasse Collin 6e76a25df2 tuklib_integer: Rename bswapXX to byteswapXX
The __builtin_bswapXX from GCC and Clang are preferred when
they are available. This can allow compilers to emit the x86 MOVBE
instruction instead of doing a load + byteswap as two instructions
(which would happen if the byteswapping is done in inline asm).

bswap16, bswap32, and bswap64 exist in system headers on *BSDs
and Darwin. #defining bswap16 on NetBSD results in a warning about
macro redefinition. It's safest to avoid this namespace conflict
completely.

No OS supported by tuklib_integer.h uses byteswapXX names and
a web search doesn't immediately find any obvious danger of
namespace conflicts. So let's try these still-pretty-short names
for the macros.

Thanks to Sam James for pointing out the compiler warning on
NetBSD 10.0.

(cherry picked from commit 4ffc60f323)
2024-05-22 14:32:36 +03:00
Lasse Collin 0ca14871f3 liblzma: API doc cleanups
(cherry picked from commit 08ab0966a7)
2024-05-22 14:32:36 +03:00
Lasse Collin 3ba3ef57f9 liblzma: lzma_str_to_filters: Set *error_pos on all errors
The API docs clearly say that if error_pos isn't NULL then *error
is always set on any error. However, it wasn't touched if str == NULL
or filters == NULL or unsupported flags were specified.

Fixes: cedeeca2ea
(cherry picked from commit 70d12dd069)
2024-05-22 14:32:36 +03:00
Lasse Collin 57ad820e15 liblzma: Clean up white space
(cherry picked from commit ed8e552395)
2024-05-22 14:32:36 +03:00
Lasse Collin 6e210d5766 liblzma: Silence a warning from Coverity static analysis
It is logical why it cannot know for sure that the value has
to be at most 4 if it is less than 16.

The x86 filter is based on a very old LZMA SDK version. Newer
ones have quite a different implementation for the same filter.

Thanks to Sam James.

(cherry picked from commit 6aa2a6deeb)
2024-05-22 14:32:36 +03:00
Lasse Collin 7413383e42 xz: Fix white space error.
Thanks to xx on #tukaani.

(cherry picked from commit eeca8f7c5b)
2024-05-22 14:32:36 +03:00
Sam James eed2f26c0e xz: add missing noreturn for message_filters_help
Fixes: a165d7df19
(cherry picked from commit 462ca94099)
2024-05-22 14:32:36 +03:00
Sam James 2633d8df61 xz: signals: suppress -Wsign-conversion on macOS
On macOS, we get:
```
signals.c: In function 'signals_init':
signals.c:76:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
   76 |                 sigaddset(&hooked_signals, sigs[i]);
      |                 ^~~~~~~~~
signals.c:81:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
   81 |                 sigaddset(&hooked_signals, message_progress_sigs[i]);
      |                 ^~~~~~~~~
signals.c:86:9: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
   86 |         sigaddset(&hooked_signals, SIGTSTP);
      |         ^~~~~~~~~
```

We use `int` for `hooked_signals` but we can't just cast to whatever
`sigset_t` is because `sigset_t` is an opaque type. It's an unsigned int
on macOS. On macOS, `sigaddset` is implemented as a macro.

Just suppress -Wsign-conversion for `signals_init` for macOS given
there's no real nice way of fixing this.

(cherry picked from commit 863f13d282)
2024-05-22 14:32:36 +03:00
Lasse Collin 5d20a61205 liblzma: CRC: Simplify table omission macros
A macro is useful to prevent a single #if directive from
getting too ugly but only one macro is needed for all archs.

(cherry picked from commit 6286c1900c)
2024-05-22 14:26:03 +03:00
Lasse Collin 2a80827e23 liblzma: ARM64 CRC: Fix omission of CRC32 table
The macro name had an odd typo so the table wasn't omitted
when it should have.

Fixes: 1940f0ec28
(cherry picked from commit 45da936c87)
2024-05-22 14:26:03 +03:00
Lasse Collin 9223ad6e78 liblzma: ARM64 CRC32: Change style of the macOS code to match FreeBSD
I didn't test this but it shouldn't change any functionality.

Fixes: 761f5b69a4
(cherry picked from commit fc43cecd32)
2024-05-22 14:26:03 +03:00
Lasse Collin 32ceb2c36a liblzma: ARM64 CRC32: Add error checking to FreeBSD-specific code
Also add parenthesis to the return statement.

I didn't test this.

Fixes: 761f5b69a4
(cherry picked from commit 1024cd4cd9)
2024-05-22 14:26:03 +03:00
Lasse Collin 42915101e9 liblzma: ARM64 CRC32: Use negation instead of subtracting from 8
Subtracting from 0 is negation, this just keeps warnings away.

Fixes: 761f5b69a4
(cherry picked from commit 2337f7021c)
2024-05-22 14:26:03 +03:00
Lasse Collin 42a9482b48 liblzma: ARM64 CRC32: Tweak coding style and comments
(cherry picked from commit d8fffd01aa)
2024-05-22 14:26:03 +03:00
Lasse Collin 34d1252f09 liblzma: Remove ifunc support.
This is *NOT* done for security reasons even though the backdoor
relied on the ifunc code. Instead, the reason is that in this
project ifunc provides little benefits but it's quite a bit of
extra code to support it. The only case where ifunc *might* matter
for performance is if the CRC functions are used directly by an
application. In normal compression use it's completely irrelevant.

(cherry picked from commit 689ae24273)
2024-05-22 14:12:43 +03:00
Lasse Collin 1a1f3d0323 xz man page: Use .ft CR instead of CW to silence warnings from groff.
(cherry picked from commit 31ef676567)
2024-05-22 14:12:43 +03:00
Lasse Collin 879295d91f Update maintainer and author info.
The other maintainer suddenly disappeared.

(cherry picked from commit 77a294d98a)
2024-05-22 14:12:43 +03:00
Lasse Collin eeb74fba1f Update website URLs back to tukaani.org.
The XZ projects were moved back to their original URLs.

(cherry picked from commit 17aa2e1a79)
2024-05-22 14:12:39 +03:00
Lasse Collin a7b9cd7000 xzdec: Tweak coding style and comments.
(cherry picked from commit 2739db9810)
2024-05-22 14:12:13 +03:00
Lasse Collin b3a7561880 liblzma: memcmplen.h: Add a comment why subtraction is used.
(cherry picked from commit 0b99783d63)
2024-05-22 14:07:37 +03:00
Lasse Collin 760f622f0d liblzma: Minor comment edits.
(cherry picked from commit 3217b82b3e)
2024-05-22 14:07:37 +03:00
Sergey Kosukhin 403b4c78b8 liblzma: Fix building with NVHPC (NVIDIA HPC SDK).
NVHPC compiler has several issues that make it impossible to
build liblzma:
  - the compiler cannot handle unions that contain pointers that
    are not the first members;
  - the compiler cannot handle the assembler code in range_decoder.h
    (LZMA_RANGE_DECODER_CONFIG has to be set to zero);
  - the compiler fails to produce valid code for delta_decode if the
    vectorization is enabled, which results in failed tests.

This introduces NVHPC-specific workarounds that address the issues.

(cherry picked from commit 096bc0e3f8)
2024-05-22 14:07:37 +03:00
Lasse Collin 1107712e37 Remove the backdoor found in 5.6.0 and 5.6.1 (CVE-2024-3094).
While the backdoor was inactive (and thus harmless) without inserting
a small trigger code into the build system when the source package was
created, it's good to remove this anyway:

  - The executable payloads were embedded as binary blobs in
    the test files. This was a blatant violation of the
    Debian Free Software Guidelines.

  - On machines that see lots bots poking at the SSH port, the backdoor
    noticeably increased CPU load, resulting in degraded user experience
    and thus overwhelmingly negative user feedback.

  - The maintainer who added the backdoor has disappeared.

  - Backdoors are bad for security.

This reverts the following without making any other changes:

6e636819 Tests: Update two test files.
a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz.
0b4ccc91 Tests: Update RISC-V test files.
8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
82ecc538 liblzma: Fix false Valgrind error report with GCC.
cf44e4b7 Tests: Add a few test files.
3060e107 Tests: Use smaller dictionary size in RISC-V test files.
e2870db5 Tests: Add two RISC-V Filter test files.

The RISC-V test files also have real content that tests the filter
but the real content would fit into much smaller files. A generator
program would need to be available as well.

Thanks to Andres Freund for finding and reporting it and making
it public quickly so others could act without a delay.
See: https://www.openwall.com/lists/oss-security/2024/03/29/4
2024-04-09 18:38:37 +03:00
Jia Tan fd1b975b78 Bump version and soname for 5.6.1. 2024-03-09 11:42:50 +08:00
Jia Tan 058337b0f1 liblzma: Fix typos in crc32_fast.c and crc64_fast.c. 2024-03-09 09:52:32 +08:00
Jia Tan 651a1545c8 liblzma: Fix false Valgrind error report with GCC.
With GCC and a certain combination of flags, Valgrind will falsely
trigger an invalid write. This appears to be due to the omission of
instructions to properly save, set up, and restore the frame pointer.

The IFUNC resolver is a leaf function since it only calls a function
that is inlined. So sometimes GCC omits the frame pointer instructions
in the resolver unless this optimization is explictly disabled.

This fixes https://bugzilla.redhat.com/show_bug.cgi?id=2267598.
2024-03-09 09:20:57 +08:00
Lasse Collin 6e97b299f1 liblzma: Fix a typo in a comment in the RISC-V filter. 2024-03-05 23:21:26 +02:00
Jia Tan 4e1c97052b liblzma: Use attribute no_profile_instrument_function with ifunc.
Thanks to Sam James for determining this was the attribute needed to
workaround the GCC bug and for his version of the patch in Gentoo.
2024-03-05 00:34:46 +08:00
Lasse Collin e98ddaf85a liblzma: Fix a comment in the RISC-V filter. 2024-03-04 19:23:18 +02:00
Lasse Collin 86bec8334b xz: Add comments. 2024-02-28 18:33:34 +02:00
Jia Tan 5c91b454c2 xz: Change logging level for thread reduction to highest verbosity only.
Now that multi threaded encoding is the default, users do not need to
see a warning message everytime the number of threads is reduced. On
some machines, this could happen very often. It is not unreasonable for
users to need to set double verbose mode to see this kind of
information.

To see these warning messages -vv or --verbose --verbose must be passed
to set xz into the highest possible verbosity mode.

These warnings had caused automated testing frameworks to fail when they
expected no output to stderr.

Thanks to Sebastian Andrzej Siewior for reporting this and for the
initial version of the patch.
2024-02-28 18:31:04 +02:00
Chien Wong f06b33edd2 xz: Add missing RISC-V on the filter list in the man page
Signed-off-by: Chien Wong <m@xv97.com>
2024-02-28 18:31:04 +02:00
Jia Tan a100f9111c Build: Fix Linux Landlock feature test in Autotools and CMake builds.
The previous Linux Landlock feature test assumed that having the
linux/landlock.h header file was enough. The new feature tests also
requires that prctl() and the required Landlock system calls are
supported.
2024-02-28 18:31:04 +02:00
Jia Tan 2d7d862e3f Bump version and soname for 5.6.0. 2024-02-24 15:55:08 +08:00
Jia Tan 898aad9fc7 xzmore: Fix typo in xzmore.1.
Thanks to Yuri Chornoivan.
2024-02-21 00:30:43 +08:00
Jia Tan eea78216d2 xz: Fix Capsicum sandbox compile error.
user_abort_pipe[] was still being used instead of the parameters.
2024-02-23 20:27:15 +08:00
Lasse Collin de4337fd89 xz: Landlock: Fix error message if input file is a directory.
If xz is given a directory, it should look like this:

    $ xz /usr/bin
    xz: /usr/bin: Is a directory, skipping

The Landlock rules didn't allow opening directories for reading:

    $ xz /usr/bin
    xz: /usr/bin: Permission denied

The simplest fix was to allow opening directories for reading.
While it's a bit silly to allow it solely for the error message,
it shouldn't make the sandbox significantly weaker.

The single-file use case (like when called from GNU tar) is
still as strict as possible: all Landlock restrictions are
enabled before (de)compression starts.
2024-02-22 15:18:25 +02:00
Lasse Collin 120da10ae1 liblzma: Disable branchless C version in range decoder.
Thanks to Sebastian Andrzej Siewior and Sam James for
benchmarking on various systems.
2024-02-22 14:41:29 +02:00
Lasse Collin 3462362ebd Scripts: Use @PACKAGE_VERSION@ instead of @VERSION@.
PACKAGE_VERSION was already used in liblzma.pc.in.
This way only one version @foo@ is used.
2024-02-19 12:21:37 +02:00
Lasse Collin 746c471643 liblzma: Remove commented-out code. 2024-02-19 11:58:33 +02:00
Lasse Collin 4ce300ce08 xz: Delete old commented-out code. 2024-02-17 23:07:35 +02:00
Lasse Collin cae9a5e0bf xz: Use stricter pledge(2) and Landlock sandbox.
This makes these sandboxing methods stricter when no files are
created or deleted. That is, it's a middle ground between the
initial sandbox and the strictest single-file-to-stdout sandbox:
this allows opening files for reading but output has to go to stdout.
2024-02-17 23:07:35 +02:00
Lasse Collin 02e3505991 xz: Support Landlock ABI version 4.
Linux 6.7 added support for ABI version 4 which restricts
TCP connections which xz won't need and thus those can be
forbidden now. Since the ABI version is handled at runtime,
supporting version 4 won't cause any compatibility issues.

Note that new enough kernel headers are required to get
version 4 support enabled at build time.
2024-02-17 23:07:35 +02:00
Lasse Collin 374868d81d xz: Move sandboxing code to sandbox.c and improve Landlock sandbox.
Landlock is now always used just like pledge(2) is: first in more
permissive mode and later (under certain common conditions) in
a strict mode that doesn't allow opening more files.

I put pledge(2) first in sandbox.c because it's the simplest API
to use and still somewhat fine-grained for basic applications.
So it's the simplest thing to understand for anyone reading sandbox.c.
2024-02-17 23:07:35 +02:00
Lasse Collin 7312dfbb02 xz: Tweak comments. 2024-02-17 23:07:35 +02:00
Lasse Collin c701a5909a xz: Fix message_init() description.
Also explicitly initialize progress_automatic to make it clear
that it can be read before message_init() sets it. Static variable
was initialized to false by default already so this is only for
clarity.
2024-02-17 23:07:35 +02:00
Lasse Collin 56246607df Build: Install translated lzmainfo man pages.
All other translated man pages were being installed but
lzmainfo had been forgotten.
2024-02-17 16:23:14 +02:00
Lasse Collin f1d6b88aef liblzma: Avoid implementation-defined behavior in the RISC-V filter.
GCC docs promise that it works and a few other compilers do
too. Clang/LLVM is documented source code only but unsurprisingly
it behaves the same as others on x86-64 at least. But the
certainly-portable way is good enough here so use that.
2024-02-17 16:01:32 +02:00
Lasse Collin 843ddc5f61 liblzma: Wrap a line exceeding 80 chars. 2024-02-17 15:50:21 +02:00
Sebastian Andrzej Siewior e9053c9072 liblzma/rangecoder: Exclude x32 from the x86-64 optimisation.
The x32 port has a x86-64 ABI in term of all registers but uses only
32bit pointer like x86-32. The assembly optimisation fails to compile on
x32. Given the state of x32 I suggest to exclude it from the
optimisation rather than trying to fix it.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
2024-02-17 15:50:21 +02:00
Jia Tan fb5f6aaf18 Fix typos discovered by codespell. 2024-02-16 22:54:59 +08:00
Jia Tan 6f1790254a Bump version for 5.5.2beta. 2024-02-15 01:53:40 +08:00
Lasse Collin 924fdeedf4 liblzma: Fix validate_map.sh.
Adding the SPDX license identifier changed the line numbers.
2024-02-14 19:46:11 +02:00
Lasse Collin a4557bad96 liblzma: Silence warnings in --enable-small build. 2024-02-14 19:21:45 +02:00
Lasse Collin 160b686264 liblzma: Silence a warning. 2024-02-14 19:05:58 +02:00
Lasse Collin 8af7db854f xz: Mention lzmainfo if trying to use 'lzma --list'.
This kind of fixes the problem reported here:
https://bugs.launchpad.net/ubuntu/+source/xz-utils/+bug/1291020
2024-02-14 18:31:16 +02:00
Lasse Collin 0668907ff7 liblzma: Add comments. 2024-02-14 18:31:16 +02:00
Lasse Collin 109f1913d4 Scripts: Add lz4 support to xzgrep and xzdiff. 2024-02-14 18:31:16 +02:00
Lasse Collin de55485cb2 liblzma: Choose the range decoder variants using a bitmask macro. 2024-02-14 18:31:16 +02:00
Lasse Collin 0709c2b2d7 xz: Fix outdated threading related info on the man page. 2024-02-14 18:31:16 +02:00
Lasse Collin 3182a330c1 liblzma: Range decoder: Add x86-64 inline assembly.
It's compatible with GCC and Clang.
2024-02-14 18:31:16 +02:00
Lasse Collin cba2edc991 liblzma: Range decoder: Add branchless C code.
It's used only for basic bittrees and fixed-size reverse bittree
because those showed a clear benefit on x86-64 with GCC and Clang.
The other methods were more mixed and thus are commented out but
they should be tested on other archs.
2024-02-14 18:31:16 +02:00
Lasse Collin e290a72d6d liblzma: Clarify a comment. 2024-02-14 18:31:16 +02:00
Lasse Collin 5e04706b91 liblzma: LZMA decoder: Optimize loop comparison.
But now it needs one more local variable.
2024-02-14 18:31:16 +02:00
Lasse Collin 88276f9f2c liblzma: Optimize literal_subcoder() macro slightly. 2024-02-14 18:31:16 +02:00
Lasse Collin 5938f6de4d liblzma: LZ decoder: Add unlikely(). 2024-02-14 18:31:16 +02:00
Lasse Collin 9c252e3ed0 liblzma: LZ decoder: Remove a useless unlikely(). 2024-02-14 18:31:16 +02:00
Lasse Collin f3872a5947 liblzma: Optimize LZ decoder slightly.
Now extra buffer space is reserved so that repeating bytes for
any single match will never need to copy from two places (both
the beginning and the end of the buffer). This simplifies
dict_repeat() and helps a little with speed.

This seems to reduce .lzma decompression time about 2 %, so
with .xz and CRC it could be slightly less. The small things
add up still.
2024-02-14 18:31:16 +02:00
Lasse Collin eb518446e5 liblzma: LZMA decoder: Get rid of next_state[].
It's not completely obvious if this is better in the decoder.
It should be good if compiler can avoid creating a branch
(like using CMOV on x86).

This also makes lzma_encoder.c use the new macros.
2024-02-14 18:31:16 +02:00
Lasse Collin e0c0ee475c liblzma: LZMA decoder improvements.
This adds macros for bittree decoding which prepares the code
for alternative C versions and inline assembly.
2024-02-14 18:31:16 +02:00
Jia Tan de5c5e4176 liblzma: Creates Non-resumable and Resumable modes for lzma_decoder.
The new decoder resumes the first decoder loop in the Resumable mode.
Then, the code executes in Non-resumable mode until it detects that it
cannot guarantee to have enough input/output to decode another symbol.

The Resumable mode is how the decoder has always worked. Before decoding
every input bit, it checks if there is enough space and will save its
location to be resumed later. When the decoder has more input/output,
it jumps back to the correct sequence in the Resumable mode code.

When the input/output buffers are large, the Resumable mode is much
slower than the Non-resumable because it has more branches and is harder
for the compiler to optimize since it is in a large switch block.

Early benchmarking shows significant time improvement (8-10% on gcc and
clang x86) by using the Non-resumable code as much as possible.
2024-02-14 18:31:16 +02:00
Jia Tan e446ab7a18 liblzma: Creates separate "safe" range decoder mode.
The new "safe" range decoder mode is the same as old range decoder, but
now the default behavior of the range decoder will not check if there is
enough input or output to complete the operation. When the buffers are
close to fully consumed, the "safe" operations must be used instead. This
will improve speed because it will reduce the number of branches needed
for most of the range decoder operations.
2024-02-14 18:31:16 +02:00
Lasse Collin e48287bf51 xzdiff, xzgrep, and xzmore: Rewrite the man pages.
The main reason is a kind of silly one:

xz-man.pot contains strings from all man pages in XZ Utils.
The man pages of xzdiff, xzgrep, and xzmore were under GPLv2
and the rest under 0BSD. Thus xz-man.pot contained strings
under two licences. po4a creates the translated man pages
from the combined 0BSD+GPLv2 xz-man.pot.

I haven't liked this mixing in xz-man.pot but the
Translation Project requires that all man pages must be
in the same .pot file. So a separate xz-man-gpl.pot
wasn't an option.

Since these man pages are short, rewriting them was quick enough.
Now xz-man.pot is entirely under 0BSD and marking the per-file
licenses is simpler.

As a bonus, some wording hopefully is now slightly better
although it's perhaps a matter of taste.

NOTE: In xzgrep.1, the EXIT STATUS section was written by me
in the commit d796b6d7fd so that's
why that section could be taken as is from the old xzgrep.1.
2024-02-14 18:31:16 +02:00
Lasse Collin 3e551b111b xzless: Update man page slightly.
The xz tool can decompress three file formats and xzless
has always supported uncompressed files too.
2024-02-14 18:31:16 +02:00
Lasse Collin b941549573 liblzma: Include the SPDX license identifier 0BSD to generated files.
Perhaps the generated files aren't even copyrightable but
using the same license for them as for the rest of the liblzma
keeps things more consistent for tools that look for license info.
2024-02-14 18:31:16 +02:00
Lasse Collin 8e4ec79483 liblzma: Fix compilation of price_tablegen.c.
It is built and run only manually so this didn't matter
unless one wanted to regenerate the price_table.c.
2024-02-14 18:31:16 +02:00
Lasse Collin e99bff3ffb Add SPDX license identifiers to GPL, LGPL, and FSFULLR files. 2024-02-14 18:31:16 +02:00
Lasse Collin 22af94128b Add SPDX license identifier into 0BSD source code files. 2024-02-14 18:31:16 +02:00
Lasse Collin 23de53421e liblzma: Sync the AUTHORS fix about SHA-256 to lzma.h. 2024-02-14 18:31:16 +02:00
Lasse Collin 689e0228ba Change most public domain parts to 0BSD.
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt
were not touched.

COPYING.0BSD was added.
2024-02-14 18:31:12 +02:00
Lasse Collin 76946dc433 Fix SHA-256 authors.
The initial commit 5d018dc035
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.

Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.
2024-02-14 15:23:00 +02:00