Commit Graph

2477 Commits

Author SHA1 Message Date
Lasse Collin 9f1a6d6f9a Build: Temporarily disable CRC CLMUL to silence OSS Fuzz
The code makes aligned 16-byte reads which may read up to 15 bytes
before the beginning or past the end of the buffer if the buffer
is misaligned. The unneeded bytes are then ignored. It cannot cross
page boundaries and thus cannot cause access violations.

This inherently trips address sanitizer which was already disabled
with __attribute__((__no_sanitize_address__)). However, it also
trips memory sanitizer if the extra bytes are uninitialized because
memory sanitizer doesn't see that those bytes then get ignored by
byte shuffling in the xmm registers.

The plan is to change the code so that all sanitizers pass but it's
not finished yet (performance shouldn't get worse) so as a temporary
measure to keep OSS Fuzz happy, the CLMUL CRC is now disabled even
though I think think the code is fine to use (and easy enough to review
the memory accesses in it too).
2024-05-15 23:14:17 +03:00
Lasse Collin 142e670a41 xz: Document the static function get_chains_memusage() 2024-05-13 18:00:41 +03:00
Lasse Collin 78e984399a xz: Rename filters_memusage_max() to get_chains_memusage() 2024-05-13 18:00:41 +03:00
Lasse Collin 54c3db0a83 xz: Rename filter_memusages to chains_memusages 2024-05-13 18:00:41 +03:00
Lasse Collin d9e1ae79ec xz: Simplify the memory usage scaling code
This is closer to what it was before the --filtersX support was added,
just extended to support for scaling all filter chains. The method
before this commit was an extended version of the original too but
it was done in a more complex way for no clear reason. In case of
an error, the complex version printed fewer informative messages
(a good thing) but it's not a sigificant benefit.

In the limit is too low even for single-threaded mode, the required
amount of memory is now reported like in 5.4.x instead of like in
5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It
had been a FIXME in the old code but it's not clear what message
makes the most sense.

Fixes: 5f0c5a0438
2024-05-13 18:00:41 +03:00
Lasse Collin 0ee56983d1 xz: Edit comments 2024-05-13 18:00:41 +03:00
Lasse Collin ec82a49c35 xz: Rename chain_idx to chain_num 2024-05-13 18:00:41 +03:00
Lasse Collin a731a6993c xz: Edit coding style 2024-05-13 18:00:41 +03:00
Lasse Collin 32eb176b89 xz: Edit comments
Fixes: 5f0c5a0438
2024-05-13 15:41:48 +03:00
Lasse Collin b90339f4da xz: Fix grammar in a comment
Fixes: cb3111e3ed
2024-05-13 15:41:48 +03:00
Lasse Collin 4c0bdaf13d xz: Rename filter_memusages to encoder_memusages 2024-05-13 15:41:46 +03:00
Lasse Collin b54aa023e0 xz: Edit coding style 2024-05-13 15:41:05 +03:00
Lasse Collin 49f67d3d3f xz: Rename filters_index to chain_num
The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68.
2024-05-13 15:41:05 +03:00
Lasse Collin ff9e8b3d06 xz: Replace a few uint32_t with "unsigned" to reduce the number of casts
These hold only tiny values.
2024-05-13 15:41:05 +03:00
Lasse Collin b5e6c1113b xz: Rename filters_used_mask to chains_used_mask
The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68.
2024-05-13 15:41:05 +03:00
Lasse Collin 32500dfaad xz: Move the setting of "check" in coder_set_compression_settings()
It's more logical to do it in the beginning instead of in the middle
of the filter chain handling.

Fixes: d6af7f3470
2024-05-13 15:41:05 +03:00
Lasse Collin ad146b1f42 xz: Rename "filters" to "chains"
The convention is that

    lzma_filter filters[LZMA_FILTERS_MAX + 1];

contains the filters of a single filter chain.
It was so here as well before the commit
d6af7f3470.
It changes "filters" to a ten-element array of filter chains.
It's clearer to call this array-of-arrays "chains".

This also renames "filter_idx" to "chain_idx" which is used
as an index as in chains[chain_idx].
2024-05-13 15:40:58 +03:00
Lasse Collin 5a4ae4e4d0 xz: Clean up a comment 2024-05-13 15:39:39 +03:00
Lasse Collin 2de80494ed xz: Add clarifying assertions 2024-05-13 15:39:39 +03:00
Lasse Collin 1eaad004bf xz: Add a clarifying assertion
Fixes: 5f0c5a0438
2024-05-13 15:39:39 +03:00
Lasse Collin 605094329b xz: Clarify a comment 2024-05-13 15:39:39 +03:00
Lasse Collin 8fac2577f2 xz: Use the info collected in parse_block_list()
This is slightly simpler and it avoids looping through
the opt_block_list array.
2024-05-13 15:39:39 +03:00
Lasse Collin 81d350dab8 xz: Remember the filter chains and the largest Block in parse_block_list() 2024-05-13 15:39:39 +03:00
Lasse Collin 46ab56968f xz: Update a comment and initialization of filters_used_mask 2024-05-13 15:39:39 +03:00
Lasse Collin e89293a0ba xz: parse_block_list: Edit integer type casting 2024-05-13 15:39:39 +03:00
Lasse Collin 87011e40c1 xz: Make filter_memusages a local variable 2024-05-13 15:39:12 +03:00
Lasse Collin 347b412a93 xz: Remove unused code and simplify
opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't
defined. Thus, when *encoding*, the message about *decoder* memory
usage is possible to show only when both encoder and decoder have
been built.

Since the message is shown only at V_DEBUG, skip the memusage
calculation if verbosity level isn't high enough.

Fixes: 5f0c5a0438
2024-05-13 15:31:15 +03:00
Lasse Collin 31358c057c xz: Fix integer type from uint64_t to uint32_t
lzma_options_lzma.dict_size is uint32_t so use it here too.

Fixes: 5f0c5a0438
2024-05-11 00:29:24 +03:00
Lasse Collin 3f71e0f3a1 debug/translation.bash: Remove an outdated test command
Since 5.3.5beta, "xz --lzma2=mf=bt4,nice=2" works even though bt4 needs
at least nice=4. It is rounded up internally by liblzma when needed.

Fixes: 5cd9f0df78
2024-05-08 21:44:48 +03:00
Lasse Collin b05a516830 Fix the date of NEWS for 5.4.5 2024-05-07 20:41:28 +03:00
Lasse Collin 6d336aeb97 Build: Update visibility.m4 from Gnulib
This fixes the syntax of the "serial" line and renames
a temporary variable.
2024-05-07 16:21:15 +03:00
Lasse Collin ab51e8ee61 po4a/update-po: Delete the *.po.authors files
These are temporary files that are needed only when running po4a.
The top-level Makefile.am puts the whole po4a directory into
distribution tarball (it's simpler) so deleting these temporary
files is needed to prevent them from getting into tarballs.
2024-05-07 15:05:21 +03:00
Lasse Collin e4780244a1 xz: Edit comments and coding style 2024-05-07 13:12:17 +03:00
Lasse Collin fe4d8b0c80 xz: Omit an incorrect comment
It likely was a leftover from a development version of the code.

Fixes: 183819bfd9
2024-05-06 23:09:13 +03:00
Lasse Collin 9bef5b8d17 xz: Add braces to a for-statement and to an if-statement
No functional changes.

Fixes: 5f0c5a0438
Fixes: 479fd58d60
2024-05-06 23:04:31 +03:00
Lasse Collin de06b9f0c0 liblzma: Omit an unneeded array from the x86 filter
Fixes: 6aa2a6deeb
2024-05-06 23:00:09 +03:00
Lasse Collin 7da488cb93 CMake: Add test_suffix.sh to the tests 2024-05-06 22:56:31 +03:00
Lasse Collin a805594ed0 Test: Add CMake support to test_suffix.sh
It needs to find the xz executable from a different directory
and work without config.h.
2024-05-06 22:55:54 +03:00
Lasse Collin 50e1948938 Update INSTALL about MINIX 3
The latest stable is 3.3.0 and it's from 2014.
Don't mention the older versions in INSTALL.
3.3.0 ships with Clang already.

Testing with 3.4.0beta6 shows that tuklib_physmem
works too so omit comments about that from INSTALL.
Visibility warnigns weren't a problem either.

Thus it's enough to mention the need for --disable-threads
as configure doesn't autodetect the lack of pthreads.
2024-05-06 20:45:34 +03:00
Lasse Collin 68d18aea14 Windows: Remove the "doc/api" line from README-Windows.txt
Fixes: 252aa1d67b
2024-05-02 23:00:16 +03:00
Lasse Collin 8ede961374 Build: Don't copy doc/api from source tree to distribution tarball
It was copied if it existed. This was intentional when autogen.sh
still built liblzma API docs with Doxygen.

Fixes: d3a77ebc04
2024-05-02 22:59:04 +03:00
Sam James 9a6761aa35 ci: add SPDX headers
I've checked over each of these and they're straightforward applications
of the relevant Github Actions.
2024-05-02 20:29:59 +03:00
Yaroslav Halchenko 81efe6119f codespell: Ignore the THANKS file and debbugs.gnu.org URL
This way "codespell -i 0" is silent.

This is the first commit from
https://github.com/tukaani-project/xz/pull/93
with trivial edits by Lasse Collin.
2024-05-01 13:51:17 +03:00
Lasse Collin 905bfc74fe Add .gitattributes to clean up git-archive output 2024-04-30 22:26:11 +03:00
Lasse Collin 3334c71d3d xzdec: Support Landlock ABI version 4
This was added to xz in 02e3505991
but I forgot to do the same in xzdec.

The Landlock sandbox in xzdec could be stricter as now it's
active only for the last file being decompressed. In xz,
read-only sandbox is used for multi-file case. On the other hand,
xz doesn't go to the strictest mode when processing the last file
when more than one file was specified; xzdec does.
2024-04-30 22:24:13 +03:00
Lasse Collin 278563ef8f liblzma: Fix incorrect function type error from sanitizer
Clang 17 with -fsanitize=address,undefined:

    src/liblzma/common/filter_common.c:366:8: runtime error:
        call to function encoder_find through pointer to incorrect
        function type 'const lzma_filter_coder *(*)(unsigned long)'
    src/liblzma/common/filter_encoder.c:187: note:
        encoder_find defined here

Use a wrapper function to get the correct type neatly.
This reduces the number of casts needed too.

This issue could be a problem with control flow integrity (CFI)
methods that check the function type on indirect function calls.

Fixes: 3b34851de1
2024-04-30 22:22:45 +03:00
Lasse Collin 77c8f60547 xz: Avoid arithmetic on a null pointer
It's undefined behavior. The result wasn't ever used as it occurred
in the last iteration of a loop.

Clang 17 with -fsanitize=address,undefined:

    $ src/xz/xz --block-list=123
    src/xz/args.c:164:12: runtime error: applying non-zero offset 1
        to null pointer

Fixes: 88ccf47205
Co-authored-by: Sam James <sam@gentoo.org>
2024-04-30 21:41:11 +03:00
Lasse Collin 64503cc2b7 CMake: Support building liblzma API docs using Doxygen
This is disabled by default to match the default in Autotools.
Use -DUSE_DOXYGEN=ON to enable Doxygen usage.

This uses the update-doxygen script, thus this is under if(UNIX)
although Doxygen itself can run on Windows too.
2024-04-30 17:09:08 +03:00
Lasse Collin 0a7f5a80d8 CMake: List API headers in LIBLZMA_API_HEADERS variable
This way the same list will be usable in more than one location.
2024-04-30 17:09:08 +03:00
Lasse Collin 541406bee3 PACKAGERS: Document the optional Doxygen usage
Also add a note that packagers should check the licensing
of the Doxygen output.
2024-04-30 17:09:08 +03:00