Commit Graph

1178 Commits

Author SHA1 Message Date
Jia Tan 47a63cad2a xz: Update --long-help for the new --filtersX option. 2023-07-17 23:34:55 +08:00
Jia Tan 8b9913a13d xz: Ignore filter chains that are set but never used in --block-list.
If a filter chain is set but not used in --block-list, it introduced
unexpected behavior such as requiring an unneeded amount of memory to
compress, reducing the number of threads in multi-threaded encoding, and
printing an incorrect amount of memory needed to decompress.

This also renames filters_init_mask => filters_used_mask. A filter is
assumed to be used if it is specified in --filtersX until
coder_set_compression_settings() determines which filters are referenced
in --block-list.
2023-07-17 23:34:55 +08:00
Jia Tan 183819bfd9 xz: Set the Block size for mt encoding correctly.
When opt_block_size is not used, the Block size for mt encoder is
derived from the minimum of the largest Block specified by
--block-list and the recommended Block size on all filter chains
calculated by lzma_mt_block_size(). This avoids using unnecessary
memory and ensures that all Blocks are large enough for the most memory
needy filter chain.
2023-07-17 23:34:55 +08:00
Jia Tan afb2dbec3d xz: Validate --flush-timeout for all specified filter chains. 2023-07-17 23:34:55 +08:00
Jia Tan 5f0c5a0438 xz: Allows --block-list filters to scale down memory usage.
Previously, only the default filter chain could have its memory usage
adjusted. The filter chains specified with --filtersX were not checked
for memory usage. Now, all used filter chains will be adjusted if
necessary.
2023-07-17 23:34:55 +08:00
Jia Tan 479fd58d60 xz: Do not include block splitting if encoders are disabled.
The block splitting logic and split_block() function are not needed if
encoders are disabled. This will help slightly reduce the binary size
when built without encoders and allow split_block() to use functions
that require encoders being enabled.
2023-07-17 23:34:55 +08:00
Jia Tan f86ede2250 xz: Free filters[] in debug mode.
This will only free filter chains created with --filters1-9 since the
default filter chain may be set from a static function variable. The
complexity to free the default filter chain is not worth the burden on
code maintenance.
2023-07-17 23:34:55 +08:00
Jia Tan f281cd0d69 xz: Add a message if --block-list is used outside of xz compresssion.
--block-list is only supported with compression in xz format. This avoids
silently ignoring when --block-list is unused.
2023-07-17 23:34:55 +08:00
Jia Tan d6af7f3470 xz: Create command line options for filters[1-9].
The new command line options are meant to be combined with --block-list.
They work as an optional extension to --block-list to specify a custom
filter chain for each block listed. The new options allow the creation
of up to 9 reusable filter chains. For instance:

xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \
--filters2=x86--lzma2 --filters3=arm64--lzma2

Will create the following blocks:
1. A block of size 10 MiB with filter chain delta, lzma2.
2. A block of size 5 MiB with filter chain arm64, lzma2.
3. A block of size 5 MiB with filter chain arm64, lzma2.
4. A block of size 5 MiB with filter chain x86, lzma2.
5. A block containing the rest of the file contents with filter chain
   delta, lzma2.
2023-07-17 23:34:55 +08:00
Jia Tan 072d292501 xz: Use lzma_filters_free() in forget_filter_chain().
This is a little cleaner than the previous implementation of
forget_filter_chain(). It is also more consistent since
lzma_str_to_filters() will always terminate the filter chain so there
is no need to terminate it later in coder_set_compression_settings().
2023-07-17 23:34:55 +08:00
Jia Tan 3d21da5cff xz: Separate string to filter conversion into a helper function.
Converting from string to filter will also need to be done for block
specific filter chains.
2023-07-17 23:34:55 +08:00
Jia Tan 5f3b898d07 xz: Update --long-help and man page for new --filters option. 2023-07-17 23:34:55 +08:00
Jia Tan 9ded880a02 xz: Add --filters option to CLI.
The --filters option uses the new lzma_str_to_filters() function
to convert a string into a full filter chain. Using this option
will reset all previous filters set by --preset, --[filter], or
--filters.
2023-07-17 23:34:55 +08:00
Jia Tan 17f8844e6f liblzma: Remove non-portable empty initializer.
Commit 78704f36e7 added an empty
initializer {} to prevent a warning. The empty initializer is a GNU
extension and results in a build failure on MSVC. The -wpedantic flag
warns about empty initializers.
2023-07-08 21:24:19 +08:00
Jia Tan 78704f36e7 liblzma: Prevent uninitialzed warning in mt stream encoder.
This change only impacts the compiler warning since it was impossible
for the wait_abs struct in stream_encode_mt() to be used before it was
initialized since mythread_condtime_set() will always be called before
mythread_cond_timedwait().

Since the mythread.h code is different between the POSIX and
Windows versions, this warning was only present on Windows builds.

Thanks to Arthur S for reporting the warning and providing an initial
patch.
2023-06-29 00:06:16 +08:00
Jia Tan e3356a204c liblzma: Prevent warning for MSYS2 Windows build.
In lzma_memcmplen(), the <intrin.h> header file is only included if
_MSC_VER and _M_X64 are both defined but _BitScanForward64() was
previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but
not _MSC_VER so _BitScanForward64() was used without including
<intrin.h>.

Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as
expected.
2023-06-28 23:59:51 +08:00
Lasse Collin ee44863ae8 liblzma: Add ifunc implementation to crc64_fast.c.
The ifunc method avoids indirection via the function pointer
crc64_func. This works on GNU/Linux and probably on FreeBSD too.
The previous __attribute((__constructor__)) method is kept for
compatibility with ELF platforms which do support ifunc.

The ifunc method has some limitations, for example, building
liblzma with -fsanitize=address will result in segfaults.
The configure option --disable-ifunc must be used for such builds.

Thanks to Hans Jansen for the original patch.
Closes: https://github.com/tukaani-project/xz/pull/53
2023-06-27 23:55:59 +08:00
Jia Tan f36ca7982f liblzma: Slightly rewords lzma_str_list_filters() documentation.
Reword "options required" to "supported options". The previous may have
suggested that the options listed were all required anytime a filter is
used for encoding or decoding. The reword makes this more clear that
adjusting the options is optional.
2023-05-13 21:21:54 +08:00
Jia Tan 3374a5359e liblzma: Adds lzma_nothrow to MicroLZMA API functions.
None of the liblzma functions may throw an exception, so this
attribute should be applied to all liblzma API functions.
2023-05-12 00:00:47 +08:00
Jia Tan 8f23657498 liblzma: Exports lzma_mt_block_size() as an API function.
The lzma_mt_block_size() was previously just an internal function for
the multithreaded .xz encoder. It is used to provide a recommended Block
size for a given filter chain.

This function is helpful to determine the maximum Block size for the
multithreaded .xz encoder when one wants to change the filters between
blocks. Then, this determined Block size can be provided to
lzma_stream_encoder_mt() in the lzma_mt options parameter when
intializing the coder. This requires one to know all the filter chains
they are using before starting to encode (or at least the filter chain
that will need the largest Block size), but that isn't a bad limitation.
2023-05-11 23:54:44 +08:00
Jia Tan d0f33d672a liblzma: Creates IS_ENC_DICT_SIZE_VALID() macro.
This creates an internal liblzma macro to test if the dictionary size
is valid for encoding.
2023-05-11 22:28:45 +08:00
Jia Tan 9ad64bdf30 tuklib_integer.h: Reverts previous commit.
Previous commit 6be460dde0 would cause an
error if the integer size was 32 bit.
2023-05-04 20:30:25 +08:00
Jia Tan 6be460dde0 tuklib_integer.h: Changes two other UINT_MAX == UINT32_MAX to >=. 2023-05-04 19:25:20 +08:00
Lasse Collin 44c0c5eae9 tuklib_integer.h: Fix a recent copypaste error in Clang detection.
Wrong line was changed in 7062348bf3.
Also, this has >= instead of == since ints larger than 32 bits would
work too even if not relevant in practice.
2023-05-03 22:55:16 +03:00
Jia Tan f41df2ac2f Windows: Include <intrin.h> when needed.
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.

GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
2023-04-19 22:22:16 +08:00
Jia Tan 7062348bf3 tuklib_integer: Use __builtin_clz() with Clang.
Clang has support for __builtin_clz(), but previously Clang would
fallback to either the MSVC intrinsic or the regular C code. This was
discovered due to a bug where a new version of Clang required the
<intrin.h> header file in order to use the MSVC intrinsics.

Thanks to Anton Kochkov for notifying us about the bug.
2023-04-19 21:59:03 +08:00
Lasse Collin 3938718ce3 liblzma: Update project maintainers in lzma.h.
AUTHORS was updated earlier, lzma.h was simply forgotten.
2023-04-14 18:42:33 +03:00
Jia Tan 2a89670ab2 liblzma: Cleans up old commented out code. 2023-04-13 20:45:19 +08:00
Jia Tan 116e81f002 Build: Removes redundant check for LZMA1 filter support. 2023-03-23 21:48:52 +08:00
Lasse Collin dfe1710784 liblzma: Silence -Wsign-conversion in SSE2 code in memcmplen.h.
Thanks to Christian Hesse for reporting the issue.
Fixes: https://github.com/tukaani-project/xz/issues/44
2023-03-19 22:45:59 +02:00
Lasse Collin b473a92891 Change a few HTTP URLs to HTTPS.
The xz man page timestamp was intentionally left unchanged.
2023-03-18 15:56:07 +02:00
Jia Tan fd90e2f4c2 liblzma: Remove note from lzma_options_bcj about the ARM64 exception.
This was left in by mistake since an early version of the ARM64 filter
used a different struct for its options.
2023-03-17 01:42:28 +08:00
Jia Tan 55ba6e9300 liblzma: Add set lzma.h as the main page for Doxygen documentation.
The \mainpage command is used in the first block of comments in lzma.h.
This changes the previously nearly empty index.html to use the first
comment block in lzma.h for its contents.

lzma.h is no longer documented separately, but this is for the better
since lzma.h only defined a few macros that users do not need to use.
The individual API header files all have a disclaimer that they should
not be #included directly, so there should be no confusion on the fact
that lzma.h should be the only header used by applications.

Additionally, the note "See ../lzma.h for information about liblzma as
a whole." was removed since lzma.h is now the main page of the
generated HTML and does not have its own page anymore. So it would be
confusing in the HTML version and was only a "nice to have" when
browsing the source files.
2023-03-17 01:42:28 +08:00
Jia Tan af55191102 liblzma: Defines masks for return values from lzma_index_checks(). 2023-03-13 20:49:53 +08:00
Lasse Collin 717aa3651c xz: Simplify the error-label in Capsicum sandbox code.
Also remove unneeded "sandbox_allowed = false;" as this code
will never be run more than once (making it work with multiple
input files isn't trivial).
2023-03-11 18:46:45 +02:00
Lasse Collin a0eecc235d xz: Make Capsicum sandbox more strict with stdin and stdout. 2023-03-08 23:22:15 +08:00
Jia Tan 916448d624 Revert: "Add warning if Capsicum sandbox system calls are unsupported."
The warning causes the exit status to be 2, so this will cause problems
for many scripted use cases for xz. The sandbox usage is already very
limited already, so silently disabling this allows it to be more usable.
2023-03-08 23:22:11 +08:00
Jia Tan 01587dda2a xz: Fix -Wunused-label in io_sandbox_enter().
Thanks to Xin Li for recommending the fix.
2023-03-07 20:02:22 +08:00
Jia Tan 5fb9367866 xz: Add warning if Capsicum sandbox system calls are unsupported.
The warning is only used when errno == ENOSYS. Otherwise, xz still
issues a fatal error.
2023-03-06 21:37:45 +08:00
Jia Tan 61ee82cb12 xz: Skip Capsicum sandbox system calls when they are unsupported.
If a system has the Capsicum header files but does not actually
implement the system calls, then this would render xz unusable. Instead,
we can check if errno == ENOSYS and not issue a fatal error.
2023-03-06 21:27:53 +08:00
Jia Tan f070722b57 xz: Reorder cap_enter() to beginning of capsicum sandbox code.
cap_enter() puts the process into the sandbox. If later calls to
cap_rights_limit() fail, then the process can still have some extra
protections.
2023-03-06 21:08:26 +08:00
Jia Tan f1ab1f6b33 liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h.
lzma_lzma_preset() does not guarentee that the lzma_options_lzma are
usable in an encoder even if it returns false (success). If liblzma
is built with default configurations, then the options will always be
usable. However if the match finders hc3, hc4, or bt4 are disabled, then
the options may not be usable depending on the preset level requested.

The documentation was updated to reflect this complexity, since this
behavior was unclear before.
2023-03-01 21:42:31 +08:00
Jia Tan 3cf72c4bcb liblzma: Replace '\n' -> newline in filter.h documentation.
The '\n' renders as a newline when the comments are converted to html
by Doxygen.
2023-02-24 21:09:39 +08:00
Jia Tan 002006be62 liblzma: Shorten return description for two functions in filter.h.
Shorten the description for lzma_raw_encoder_memusage() and
lzma_raw_decoder_memusage().
2023-02-24 21:09:39 +08:00
Jia Tan 463d9359b8 liblzma: Reword a few lines in filter.h 2023-02-24 21:09:39 +08:00
Jia Tan 01441df92c liblzma: Improve documentation in filter.h.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.

Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
2023-02-24 21:09:39 +08:00
Lasse Collin 30e95bb44c liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-02-23 20:41:22 +02:00
Jia Tan fa9065fac5 liblzma: Adjust container.h for consistency with filter.h. 2023-02-23 20:27:59 +08:00
Jia Tan 00a721b63d liblzma: Fix small typos and reword a few things in filter.h. 2023-02-23 20:27:59 +08:00
Jia Tan 5b1c171d4f liblzma: Convert list of flags in lzma_mt to bulleted list. 2023-02-23 20:27:59 +08:00
Jia Tan dbd47622eb liblzma: Fix typo in documentation in container.h
lzma_microlzma_decoder -> lzma_microlzma_encoder
2023-02-23 20:27:59 +08:00
Jia Tan 14cd30806d liblzma: Improve documentation for container.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
2023-02-23 20:27:59 +08:00
Lasse Collin d831072cce liblzma: Very minor API doc tweaks.
Use "member" to refer to struct members as that's the term used
by the C standard.

Use lzma_options_delta.dist and such in docs so that in Doxygen's
HTML output they will link to the doc of the struct member.

Clean up a few trailing white spaces too.
2023-02-16 21:09:00 +02:00
Jia Tan f029daea39 liblzma: Adjust spacing in doc headers in bcj.h. 2023-02-17 00:54:33 +08:00
Jia Tan a5de68bac2 liblzma: Adjust documentation in bcj.h for consistent style. 2023-02-17 00:49:47 +08:00
Jia Tan efa498c13b liblzma: Rename field => member in documentation.
Also adjusted preset value => preset level.
2023-02-17 00:49:47 +08:00
Lasse Collin 718b22a6c5 liblzma: Silence a warning from MSVC.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.

Thanks to Nathan Moinvaziri for reporting this.
2023-02-16 17:59:50 +02:00
Jia Tan 87c53553fa liblzma: Improve documentation for stream_flags.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.

A few small things were reworded and long sentences broken up.
2023-02-16 21:04:54 +08:00
Jia Tan 13d99e75a5 liblzma: Improve documentation in lzma12.h.
All functions now explicitly specify parameter and return values.
2023-02-15 22:21:44 +08:00
Jia Tan 43ec344c86 liblzma: Improve documentation in check.h.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
2023-02-15 00:59:16 +08:00
Jia Tan 9c71db4e88 liblzma: Improve documentation in index.h
All functions now explicitly specify parameter and return values.
2023-02-15 00:20:44 +08:00
Jia Tan 421f2f2e16 liblzma: Reword a comment in index.h. 2023-02-15 00:20:44 +08:00
Jia Tan b675394849 liblzma: Omit lzma_index_iter's internal field from Doxygen docs.
Add \private above this field and its sub-fields since it is not meant
to be modified by users.
2023-02-15 00:20:44 +08:00
Jia Tan 0c9e4fc2ad liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR.
LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put
documentation after a member.
2023-02-14 20:41:05 +08:00
Jia Tan 816fec125a liblzma: Improve documentation for base.h.
Standardizing each function to always specify params and return values.
Also fixed a small grammar mistake.
2023-02-14 20:41:05 +08:00
Jia Tan 862dacef1a liblzma: Add one more missing [out] annotation in vli.h 2023-02-14 00:12:34 +08:00
Jia Tan 867b08ae42 liblzma: Minor improvements to vli.h.
Added [out] annotations to parameters that are pointers and can have
their value changed. Also added a clarification to lzma_vli_is_valid.
2023-02-14 00:08:33 +08:00
Jia Tan 90d0e628ff liblzma: Add comments for macros in delta.h.
Document LZMA_DELTA_DIST_MIN and LZMA_DELTA_DIST_MAX for completeness
and to avoid Doxygen warnings.
2023-02-10 21:38:25 +08:00
Jia Tan 9255fffdb1 liblzma: Improve documentation in index_hash.h.
All functions now explicitly specify parameter and return values.
Also reworded the description of lzma_index_hash_init() for readability.
2023-02-10 21:35:23 +08:00
Lasse Collin 1dbe12b90c xz: Improve the comment about start_time in mytime.c.
start_time is relative to an arbitary point in time, it's not
time of day, so using it for anything else than time differences
wouldn't make sense.
2023-02-07 19:07:45 +02:00
Jia Tan b8bce89be7 xz: Add a comment clarifying the use of start_time in mytime.c. 2023-02-04 20:11:51 +08:00
Jia Tan 912af91b10 liblzma: Improve documentation for version.h.
Specified parameter and return values for API functions and documented
a few more of the macros.
2023-02-04 20:11:36 +08:00
Jia Tan 2c78a83c6f liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.
The bug is only a problem in applications that do not properly terminate
the filters[] array with LZMA_VLI_UNKNOWN or have more than
LZMA_FILTERS_MAX filters. This bug does not affect xz.
2023-02-03 00:42:27 +08:00
Jia Tan 8dfc029e7a liblzma: Fix typos in comments in string_conversion.c. 2023-02-03 00:42:27 +08:00
Jia Tan 54ad83c1ae liblzma: Clarify block encoder and decoder documentation.
Added a few sentences to the description for lzma_block_encoder() and
lzma_block_decoder() to highlight that the Block Header must be coded
before calling these functions.
2023-02-03 00:22:53 +08:00
Jia Tan f680e771b3 Update lzma_block documentation for lzma_block_uncomp_encode(). 2023-02-03 00:22:53 +08:00
Jia Tan 504cf4af89 liblzma: Minor edits to lzma_block header_size documentation. 2023-02-03 00:22:53 +08:00
Jia Tan 115b720fb5 liblzma: Enumerate functions that read version in lzma_block. 2023-02-03 00:22:53 +08:00
Jia Tan 85ea0979ad liblzma: Clarify comment in block.h. 2023-02-03 00:22:53 +08:00
Jia Tan 1f7ab90d9c liblzma: Improve documentation for block.h.
Standardizing each function to always specify params and return values.
Output pointer parameters are also marked with doxygen style [out] to
make it clear. Any note sections were also moved above the parameter and
return sections for consistency.
2023-02-03 00:22:53 +08:00
Jia Tan c563a4bc55 liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION.
The flag description for LZMA_STR_NO_VALIDATION was previously confusing
about the treatment for filters than cannot be used with .xz format
(lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that
LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.
2023-02-01 23:39:45 +08:00
Lasse Collin 610dde15a8 xz: Use clock_gettime() even if CLOCK_MONOTONIC isn't available.
mythread.h and thus liblzma already does it.
2023-01-27 20:02:49 +02:00
Lasse Collin ff592c616e xz: Add SIGTSTP handler for progress indicator time keeping.
This way, if xz is stopped the elapsed time and estimated time
remaining won't get confused by the amount of time spent in
the stopped state.

This raises SIGSTOP. It's not clear to me if this is the correct way.
POSIX and glibc docs say that SIGTSTP shouldn't stop the process if
it is orphaned but this commit doesn't attempt to handle that.

Search for SIGTSTP in section 2.4.3:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
2023-01-27 19:37:47 +02:00
Lasse Collin af5a4bd5af tuklib_physmem: Check for __has_warning before GCC version.
Clang can be configured to fake a too high GCC version so
this way it's more robust.
2023-01-26 17:39:46 +02:00
Jia Tan f35d98e206 liblzma: Fix documentation in filter.h for lzma_str_to_filters()
The previous documentation for lzma_str_to_filters() was technically
correct, but misleading. lzma_str_to_filters() returns NULL on success,
which is in practice always defined to 0. This is the same value as
LZMA_OK, but lzma_str_to_filters() does not return lzma_ret so we should
be more clear.
2023-01-24 20:48:50 +08:00
Lasse Collin 2f78ecc593 Revert "tuklib_common: Define __has_warning if it is not defined."
This reverts commit 82e3c968bf.

Macros in the reserved namespace (_foo or __foo) shouldn't be #defined
without a very good reason. Here the alternative would have been
to #define tuklib_has_warning(str) to an approriate value.

Also the tuklib_* files should stay namespace clean if possible.
2023-01-24 20:20:51 +08:00
Lasse Collin 8366cf8738 tuklib_physmem: Clean up the way -Wcast-function-type is silenced on Windows.
__has_warning and other __has_foo macros are meant to become
compiler-agnostic so it's not good to check for __clang__ with it.

This also relied on tuklib_common.h for #defining __has_warning
which was confusing as #defining reserved macros is generally
not a good idea.
2023-01-24 20:20:40 +08:00
Lasse Collin 683a3c7e2f xz: Flip the return value of suffix_is_set to match the documentation.
Also edit style to match the existing coding style in the project.
2023-01-24 20:20:04 +08:00
Jia Tan cc5aa9ab13 xz: Refactor duplicated check for custom suffix when using --format=raw 2023-01-21 22:10:51 +08:00
Jia Tan 9663141274 liblzma: Set documentation on all reserved fields to private.
This prevents the reserved fields from being part of the generated
Doxygen documentation.
2023-01-21 21:37:48 +08:00
Jia Tan 6fcf4671b6 liblzma: Highlight liblzma API headers should not be included directly.
This improves the generated Doxygen HTML files to better highlight
how to properly use the liblzma API header files.
2023-01-20 00:51:12 +08:00
Jia Tan b43ff180fb tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
2023-01-19 20:35:09 +08:00
Jia Tan 82e3c968bf tuklib_common: Define __has_warning if it is not defined.
clang supports the __has_warning macro to determine if the version of
clang compiling the code supports a given warning. If we do not define
it for other compilers, it may cause a preprocessor error.
2023-01-19 20:32:40 +08:00
Jia Tan d3e1147705 xz: Add missing comment for coder_set_compression_settings() 2023-01-16 21:35:45 +08:00
Jia Tan 123255b6ed xz: Do not set compression settings with raw format in list mode.
Calling coder_set_compression_settings() in list mode with verbose mode
on caused the filter chain and memory requirements to print. This was
unnecessary since the command results in an error and not consistent
with other formats like lzma and alone.
2023-01-16 20:55:10 +08:00
Lasse Collin 52dc033d0b xz: Use ssize_t for the to-be-ignored return value from write(fd, ptr, 1).
It makes no difference here as the return value fits into an int
too and it then gets ignored but this looks better.
2023-01-12 06:05:58 +02:00
Lasse Collin b1a6d180a3 xz: Silence warnings from -Wsign-conversion in a 32-bit build. 2023-01-12 06:01:12 +02:00
Lasse Collin 31c21c734b liblzma: Silence another warning from -Wsign-conversion in a 32-bit build.
It doesn't warn on a 64-bit system because truncating
a ptrdiff_t (signed long) to uint32_t is diagnosed under
-Wconversion by GCC and -Wshorten-64-to-32 by Clang.
2023-01-12 05:38:48 +02:00
Lasse Collin 37fbdfb726 liblzma: Silence a warning from -Wsign-conversion in a 32-bit build. 2023-01-12 04:46:45 +02:00
Lasse Collin 3f13bf6b9e liblzma: Silence warnings from clang -Wconditional-uninitialized.
This is similar to 2ce4f36f17.
The actual initialization of the variables is done inside
mythread_sync() macro. Clang doesn't seem to see that
the initialization code inside the macro is always executed.
2023-01-12 03:19:59 +02:00
Lasse Collin 6c886cc5b3 Fix warnings from clang -Wdocumentation. 2023-01-12 03:11:40 +02:00
Jia Tan d1561c47ec
xz: Fix warning -Wformat-nonliteral on clang in message.c.
clang and gcc differ in how they handle -Wformat-nonliteral. gcc will
allow a non-literal format string as long as the function takes its
format arguments as a va_list.
2023-01-11 22:46:48 +08:00
Lasse Collin 0b8fa310cf liblzma: CLMUL CRC64: Work around a bug in MSVC, second attempt.
This affects only 32-bit x86 builds. x86-64 is OK as is.

I still cannot easily test this myself. The reporter has tested
this and it passes the tests included in the CMake build and
performance is good: raw CRC64 is 2-3 times faster than the
C version of the slice-by-four method. (Note that liblzma doesn't
include a MSVC-compatible version of the 32-bit x86 assembly code
for the slice-by-four method.)

Thanks to Iouri Kharon for figuring out a fix, testing, and
benchmarking.
2023-01-10 22:15:55 +02:00
Lasse Collin cfabb62a48 Revert "liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022)."
This reverts commit 36edc65ab4.

It was reported that it wasn't a good enough fix and MSVC
still produced (different kind of) bad code when building
for 32-bit x86 if optimizations are enabled.

Thanks to Iouri Kharon.
2023-01-10 12:47:16 +02:00
Lasse Collin 0b64215170 sysdefs.h: Don't include strings.h anymore.
On some platforms src/xz/suffix.c may need <strings.h> for
strcasecmp() but suffix.c includes the header when it needs it.

Unless there is an old system that otherwise supports enough C99
to build XZ Utils but doesn't have C89/C90-compatible <string.h>,
there should be no need to include <strings.h> in sysdefs.h.
2023-01-10 11:56:11 +02:00
Lasse Collin ec2fc39fe4 xz: Include <strings.h> in suffix.c if needed for strcasecmp().
SUSv2 and POSIX.1‐2017 declare only a few functions in <strings.h>.
Of these, strcasecmp() is used on some platforms in suffix.c.
Nothing else in the project needs <strings.h> (at least if
building on a modern system).

sysdefs.h currently includes <strings.h> if HAVE_STRINGS_H is
defined and suffix.c relied on this.

Note that dos/config.h doesn't #define HAVE_STRINGS_H even though
DJGPP does have strings.h. It isn't needed with DJGPP as strcasecmp()
is also in <string.h> in DJGPP.
2023-01-10 11:23:41 +02:00
Lasse Collin 7049c4a76c sysdefs.h: Fix a comment. 2023-01-10 10:05:13 +02:00
Lasse Collin 194a5fab69 sysdefs.h: Don't include memory.h anymore even if it were available.
It quite probably was never needed, that is, any system where memory.h
was required likely couldn't compile XZ Utils for other reasons anyway.

XZ Utils 5.2.6 and later source packages were generated using
Autoconf 2.71 which no longer defines HAVE_MEMORY_H. So the code
being removed is no longer used anyway.
2023-01-10 10:04:06 +02:00
Lasse Collin 36edc65ab4 liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022).
I haven't tested with MSVC myself and there doesn't seem to be
information about the problem online, so I'm relying on the bug report.

Thanks to Iouri Kharon for the bug report and the patch.
2023-01-09 12:22:05 +02:00
Jia Tan 6fd39664de
Merge pull request #7 from tukaani-project/tuktest_index_hash
Tuktest index hash
2023-01-07 00:10:50 +08:00
Jia Tan 78e0561dfe Style: Change #if !defined() to #ifndef in mythread.h. 2023-01-06 20:43:31 +08:00
Jia Tan 84f9687cba liblzma: Remove common.h include from common/index.h.
common/index.h is needed by liblzma internally and tests. common.h will
include and define many things that are not needed by the tests. Also,
this prevents include order problems because common.h will redefine
LZMA_API resulting in a warning.
2023-01-05 20:57:25 +08:00
Lasse Collin 4103a2e78a Bump version and soname for 5.5.0alpha.
5.5.0alpha won't be released, it's just to mark that
the branch is not for stable 5.4.x.

Once again there is no API/ABI stability for new features
in devel versions. The major soname won't be bumped even
if API/ABI of new features breaks between devel releases.
2023-01-02 17:20:47 +02:00
Jia Tan bb740e3b11
Build: Only define HAVE_PROGRAM_INVOCATION_NAME if it is set to 1.
HAVE_DECL_PROGRAM_INVOCATION_NAME is renamed to
HAVE_PROGRAM_INVOCATION_NAME. Previously,
HAVE_DECL_PROGRAM_INVOCATION_NAME was always set when
building with autotools. CMake would only set this when it was 1, and the
dos/config.h did not define it. The new macro definition is consistent
across build systems.
2023-01-02 22:33:48 +08:00
Jia Tan f16e12d5e7 liblzma: Add NULL check to lzma_index_hash_append.
This is for consistency with lzma_index_append.
2023-01-02 22:20:04 +08:00
Jia Tan 203b008eb2 liblzma: Replaced hardcoded 0x0 index indicator byte with macro 2023-01-02 22:20:04 +08:00
Jia Tan 2fcba17fc4 xz: Includes <time.h> and <sys/time.h> conditionally in mytime.c.
Previously, mytime.c depended on mythread.h for <time.h> to be included.
2022-12-30 23:34:31 +08:00
Jia Tan f82294c831 liblzma: Includes sys/time.h conditionally in mythread
Previously, <sys/time.h> was always included, even if mythread only used
clock_gettime. <time.h> is still needed even if clock_gettime is not used
though because struct timespec is needed for mythread_condtime.
2022-12-30 23:34:31 +08:00
Jia Tan 74dae7d300 Build: No longer require HAVE_DECL_CLOCK_MONOTONIC to always be set.
Previously, if threading was enabled HAVE_DECL_CLOCK_MONOTONIC would always
be set to 0 or 1. However, this macro was needed in xz so if xz was not
built with threading and HAVE_DECL_CLOCK_MONOTONIC was not defined but
HAVE_CLOCK_GETTIME was, it caused a warning during build. Now,
HAVE_DECL_CLOCK_MONOTONIC has been renamed to HAVE_CLOCK_MONOTONIC and
will only be set if it is 1.
2022-12-30 23:34:31 +08:00
Jia Tan 1275ebfba7 liblzma: Update documentation for lzma_filter_encoder. 2022-12-30 23:34:31 +08:00
Jia Tan 5f7ce42a16 liblzma: Fix lzma_microlzma_encoder() return value.
Using return_if_error on lzma_lzma_lclppb_encode was improper because
return_if_error is expecting an lzma_ret value, but
lzma_lzma_lclppb_encode returns a boolean. This could result in
lzma_microlzma_encoder, which would be misleading for applications.
2022-12-30 23:34:31 +08:00
Lasse Collin 8fd225a2c1 liblzma: Update authors list in arm64.c. 2022-12-16 18:30:02 +02:00
Lasse Collin b69da6d4bb Bump version to 5.4.0 and soname to 5.4.0. 2022-12-13 20:46:41 +02:00
Lasse Collin 854f2f5946 xz: Rename --experimental-arm64 to --arm64. 2022-12-11 21:13:57 +02:00
Lasse Collin 31dbd1e5fb liblzma: Change LZMA_FILTER_ARM64 to the official Filter ID 0x0A. 2022-12-11 21:13:06 +02:00
Lasse Collin 01b3549e52 xz: Make args_info.files_name a const pointer. 2022-12-08 19:24:22 +02:00
Lasse Collin bc665b84ea xz: Don't modify argv[].
The code that parses --memlimit options and --block-list modified
the argv[] when parsing the option string from optarg. This was
visible in "ps auxf" and such and could be confusing. I didn't
understand it back in the day when I wrote that code. Now a copy
is allocated when modifiable strings are needed.
2022-12-08 19:18:16 +02:00
Lasse Collin ac2a747e93 liblzma: Check for unexpected NULL pointers in block_header_decode().
The API docs gave an impression that such checks are done
but they actually weren't done. In practice it made little
difference since the calling code has a bug if these are NULL.

Thanks to Jia Tan for the original patch that checked for
block->filters == NULL.
2022-12-08 17:30:09 +02:00
Lasse Collin 24790f49ae Bump version number for 5.3.5beta.
This also sorts the symbol names alphabetically in liblzma_*.map.
2022-12-01 20:59:32 +02:00
Lasse Collin 62b270988e liblzma: Use __has_attribute(__symver__) to fix Clang detection.
If someone sets up Clang to define __GNUC__ to 10 or greater
then symvers broke. __has_attribute is supported by such GCC
and Clang versions that don't support __symver__ so this should
be much better and simpler way to detect if __symver__ is
actually supported.

Thanks to Tomasz Gajc for the bug report.
2022-12-01 20:55:21 +02:00
Lasse Collin f9ca7d4516 liblzma: Omit zero-skipping from ARM64 filter.
It has some complicated downsides and its usefulness is more limited
than I originally thought. So this change is bad for certain very
specific situations but a generic solution that works for other
filters (and is otherwise better too) is planned anyway. And this
way 7-Zip can use the same compatible filter for the .7z format.

This is still marked as experimental with a new temporary Filter ID.
2022-12-01 18:55:00 +02:00
Lasse Collin 5baec3f0a9 xz: Omit the special notes about ARM64 filter on the man page. 2022-12-01 18:13:27 +02:00
Lasse Collin 0c3627b518 liblzma: Don't be over-specific in lzma_str_to_filters API doc. 2022-12-01 18:12:03 +02:00
Lasse Collin 94adf057f2 liblzma: Silence unused variable warning when BCJ filters are disabled.
Thanks to Jia Tan for the original patch.
2022-12-01 17:54:23 +02:00
Jia Tan 7c16e312cb xz: Remove message_filters_to_str function prototype from message.h.
This was forgotten from 7484744af6.
2022-11-30 18:12:35 +02:00
Jia Tan 0a72b9ca2f liblzma: Improve documentation for string to filter functions. 2022-11-29 22:29:15 +02:00
Lasse Collin a6e21fcede liblzma: Two fixes to lzma_str_list_filters() API docs.
Thanks to Jia Tan.
2022-11-29 22:27:42 +02:00
Lasse Collin 7484744af6 xz: Use lzma_str_from_filters().
Two uses: Displaying encoder filter chain when compressing with -vv,
and displaying the decoder filter chain in --list -vv.
2022-11-28 22:05:32 +02:00
Lasse Collin cedeeca2ea liblzma: Add lzma_str_to_filters, _from_filters, and _list_filters.
lzma_str_to_filters() uses static error messages which makes
them not very precise. It tells the position in the string
where an error occurred though which helps quite a bit if
applications take advantage of it. Dynamic error messages can
be added later with a new flag if it seems important enough.
2022-11-28 21:54:24 +02:00
Lasse Collin 072ebf7b13 liblzma: Make lzma_validate_chain() available outside filter_common.c. 2022-11-28 21:02:19 +02:00
Lasse Collin 5f22bd2d37 liblzma: Remove lzma_lz_decoder_uncompressed() as it's now unused. 2022-11-28 10:51:03 +02:00
Lasse Collin cee8320646 liblzma: Use LZMA1EXT feature in lzma_microlzma_decoder().
Here too this avoids the slightly ugly method to set
the uncompressed size.

Also moved the setting of dict_size to the struct initializer.
2022-11-28 10:48:53 +02:00
Lasse Collin e310e8b6a4 liblzma: Use LZMA1EXT feature in lzma_alone_decoder().
This avoids the need to use the slightly ugly method to
set the uncompressed size.
2022-11-28 10:28:20 +02:00
Lasse Collin 33b8a24b66 liblzma: Add LZMA_FILTER_LZMA1EXT to support LZMA1 without end marker.
Some file formats need support for LZMA1 streams that don't use
the end of payload marker (EOPM) alias end of stream (EOS) marker.
So far liblzma API has supported decompressing such streams via
lzma_alone_decoder() when .lzma header specifies a known
uncompressed size. Encoding support hasn't been available in the API.

Instead of adding a new LZMA1-only API for this purpose, this commit
adds a new filter ID for use with raw encoder and decoder. The main
benefit of this approach is that then also filter chains are possible,
for example, if someone wants to implement support for .7z files that
use the x86 BCJ filter with LZMA1 (not BCJ2 as that isn't supported
in liblzma).
2022-11-27 23:16:21 +02:00
Lasse Collin 9a304bf1e4 liblzma: Avoid unneeded use of void pointer in LZMA decoder. 2022-11-27 18:43:07 +02:00
Lasse Collin 218394958c liblzma: Pass the Filter ID to LZ encoder and decoder.
This allows using two Filter IDs with the same
initialization function and data structures.
2022-11-27 18:20:33 +02:00
Lasse Collin 1663c7676b liblzma: Remove two FIXME comments. 2022-11-27 01:03:16 +02:00
Lasse Collin 11fe708db7 xz: Use lzma_filters_free(). 2022-11-26 22:25:30 +02:00
Lasse Collin e782af9110 liblzma: Use lzma_filters_free() in more places. 2022-11-26 22:21:13 +02:00
Lasse Collin 90caaded2d liblzma: Omit simple coder init functions if they are disabled. 2022-11-25 18:04:37 +02:00