1
0
mirror of https://git.tukaani.org/xz.git synced 2025-02-21 07:58:27 +00:00

902 Commits

Author SHA1 Message Date
Lasse Collin
769b5d0055 xz: Delete old commented-out code.
(cherry picked from commit 4ce300ce0884c6e552de2af9ae8050b47b01f0e7)
(cherry picked from commit b4b315a2060c0771b0d0ca83b9b31fcf1db40484)
2024-05-22 00:28:03 +03:00
Lasse Collin
d800c85838 xz: Tweak comments.
(cherry picked from commit 7312dfbb02197c7f990c7a3cefd027a9387d1473)
(cherry picked from commit 9c9a3e7b3f70b214cfdc2aada90d28a54466a5db)
2024-05-22 00:27:48 +03:00
Lasse Collin
7d487a4c2a xz: Fix message_init() description.
Also explicitly initialize progress_automatic to make it clear
that it can be read before message_init() sets it. Static variable
was initialized to false by default already so this is only for
clarity.

(cherry picked from commit c701a5909ad9882469fbab4fab5d2d5556d3ba78)
(cherry picked from commit 7d3418e496d00287d08a7f5af265379b757247a6)
2024-05-22 00:27:25 +03:00
Lasse Collin
fda91a5d77 liblzma: Fix compilation of price_tablegen.c.
It is built and run only manually so this didn't matter
unless one wanted to regenerate the price_table.c.

(cherry picked from commit 8e4ec794836bc1701d8c9bd5e347b8ce8cc5bbb4)
(cherry picked from commit 65b5ee071697e4fe4c2a31c14c1d68b727f1654c)
2024-05-07 19:57:27 +03:00
Lasse Collin
594b64742f liblzma: Sync the AUTHORS fix about SHA-256 to lzma.h.
(based on commit 23de53421ea258cde6a3c33a038b1e9d08f771d1)

(cherry picked from commit f200c338f8d40b1b961033a3403d6512d3f34730)
2024-05-07 19:57:27 +03:00
Lasse Collin
6aba0e2a5e Fix SHA-256 authors.
The initial commit 5d018dc03549c1ee4958364712fb0c94e1bf2741
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.

Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.

(cherry picked from commit 76946dc4336c831fe2cc26696a035d807dd3cf13)
(cherry picked from commit 402fb45c743b736fa033b4b04881f6d1098581fd)
2024-05-07 19:57:27 +03:00
Lasse Collin
2c89f377cd xzless: Use ||- in LESSOPEN with with "less" 451 and newer.
(cherry picked from commit 9860d418d296eb3c721e5384fb367c0499b579c8)
(cherry picked from commit e5ba545f16e989ac51c38556e727c8c81988c04e)
2024-05-07 19:57:27 +03:00
Lasse Collin
3af41a23c4 xzless: Use --show-preproc-errors with "less" 632 and newer.
This makes "less" show a warning if a decompression error occurred.

(cherry picked from commit fd0692b0525e6c26b496492be9e2c865cab734f8)
(cherry picked from commit 5e7a8c0869d9b4c32c34d70b48b0935721aa37fd)
2024-05-07 19:57:27 +03:00
Lasse Collin
74d36a57c5 liblzma: Set all values in lzma_lz_encoder to NULL after allocation.
This is unlikely to be a bug in an existing application since it relies
on calling lzma_filters_update() on an LZMA1 encoder in the first place.
For instance, it does not affect xz because lzma_filters_update() can
only be used when encoding to the .xz format.

(based on commit 8191720eac950a5db89c4d33d6beea6316a49b19)
2024-05-07 19:56:46 +03:00
Jia Tan
176ae9073c liblzma: Make parameter names in function definition match declaration.
lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the
parameter name instead of "filters" (used by the declaration). "filters"
is more clear since the parameter represents the list of filters passed
to the raw encoder, each of which contains filter options.

(cherry picked from commit 27ab54af848ec4adc9c17362f6c64a42a7003df5)
2024-05-07 17:57:51 +03:00
Jia Tan
94c8503486 liblzma: Improve lzma encoder init function consistency.
lzma_encoder_init() did not check for NULL options, but
lzma2_encoder_init() did. This is more of a code style improvement than
anything else to help make lzma_encoder_init() and lzma2_encoder_init()
more similar.

(cherry picked from commit 019afd72e02339a6bf00c32bfb56f649c637dd6b)
2024-05-07 17:57:51 +03:00
Jia Tan
6a7c0a9ab2 xz: Use is_tty() in message.c.
(cherry picked from commit 7dc466d62155cb7442aa5e10633e084ed384360d)
2024-05-07 17:57:51 +03:00
Jia Tan
9f00ad72f0 xz: Create separate is_tty() function.
The new is_tty() will report if a file descriptor is a terminal or not.
On POSIX systems, it is a wrapper around isatty(). However, the native
Windows implementation of isatty() will return true for all character
devices, not just terminals. So is_tty() has a special case for Windows
so it can use alternative Windows API functions to determine if a file
descriptor is a terminal.

This fixes a bug with MSVC and MinGW-w64 builds that refused to read from
or write to non-terminal character devices because xz thought it was a
terminal. For instance:

    xz foo -c > /dev/null

would fail because /dev/null was assumed to be a terminal.

(cherry picked from commit 0ecfaa6df91f7c37510f370295f593b9c0b88b98)
2024-05-07 17:57:49 +03:00
Jia Tan
9c47c0ea18 liblzma: Add missing comments to lz_encoder.h.
(cherry picked from commit 84196e8c094402cb71b669fb9e984c56ebabb145)
2024-05-07 17:50:45 +03:00
Lasse Collin
290c954289 liblzma: Fix compilation of fastpos_tablegen.c.
The macro lzma_attr_visibility_hidden has to be defined to make
fastpos.h usable. The visibility attribute is irrelevant to
fastpos_tablegen.c so simply #define the macro to an empty value.

fastpos_tablegen.c is never built by the included build systems
and so the problem wasn't noticed earlier. It's just a standalone
program for generating fastpos_table.c.

Fixes: https://github.com/tukaani-project/xz/pull/69
Thanks to GitHub user Jamaika1.

(cherry picked from commit d90ed84db9770712e2421e170076b43bda9b64a7)
2024-05-07 17:50:45 +03:00
Lasse Collin
d7bff1341e liblzma: Use lzma_always_inline in memcmplen.h.
(cherry picked from commit 068ee436f4a8a706125ef43e8228b30001b1554e)
2024-05-07 17:50:45 +03:00
Lasse Collin
ce8d257cbb liblzma: #define lzma_always_inline in common.h.
(cherry picked from commit 6cdf0a7b7974baf58c1fd20ec3278f3b84ae56e5)
2024-05-07 17:50:45 +03:00
Lasse Collin
47b3d2761e liblzma: Use lzma_attr_visibility_hidden on private extern declarations.
These variables are internal to liblzma and not exposed in the API.

(cherry picked from commit 33daad3961a4f07f3902b40f13e823e6e43e85da)
2024-05-07 17:50:45 +03:00
Lasse Collin
44c98e9399 liblzma: #define lzma_attr_visibility_hidden in common.h.
In ELF shared libs:

-fvisibility=hidden affects definitions of symbols but not
declarations.[*] This doesn't affect direct calls to functions
inside liblzma as a linker can replace a call to lzma_foo@plt
with a call directly to lzma_foo when -fvisibility=hidden is used.

[*] It has to be like this because otherwise every installed
    header file would need to explictly set the symbol visibility
    to default.

When accessing extern variables that aren't defined in the
same translation unit, compiler assumes that the variable has
the default visibility and thus indirection is needed. Unlike
function calls, linker cannot optimize this.

Using __attribute__((__visibility__("hidden"))) with the extern
variable declarations tells the compiler that indirection isn't
needed because the definition is in the same shared library.

About 15+ years ago, someone told me that it would be good if
the CRC tables would be defined in the same translation unit
as the C code of the CRC functions. While I understood that it
could help a tiny amount, I didn't want to change the code because
a separate translation unit for the CRC tables was needed for the
x86 assembly code anyway. But when visibility attributes are
supported, simply marking the extern declaration with the
hidden attribute will get identical result. When there are only
a few affected variables, this is trivial to do. I wish I had
understood this back then already.

(cherry picked from commit 6961a5ac7df178bfc2b7a181c40575847bc3035f)
2024-05-07 17:50:45 +03:00
Lasse Collin
7834108dfe liblzma: Refer to MinGW-w64 instead of MinGW in the API headers.
MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project
at <https://osdn.net/projects/mingw/>) has GCC 9.2.0 as the
most recent GCC package (released 2021-02-02). The project might
still be alive but majority of people have switched to MinGW-w64.
Thus it seems clearer to refer to MinGW-w64 in our API headers too.
Building with MinGW is likely to still work but I haven't tested it
in the recent years.

(cherry picked from commit 5b9e16764905d06fa8e8339ba185ddfee304e5fb)
2024-05-07 17:47:12 +03:00
Lasse Collin
3c026350e8 liblzma: Add Cflags.private to liblzma.pc.in for MSYS2.
It properly adds -DLZMA_API_STATIC when compiling code that
will be linked against static liblzma. Having it there on
systems other than Windows does no harm.

See: https://www.msys2.org/docs/pkgconfig/
(cherry picked from commit 4083c8e9501a48934a5fb563d2c3ce2ae143cd27)
2024-05-07 16:26:35 +03:00
Lasse Collin
cf003b3ac2 sysdefs.h: Update the comment about __USE_MINGW_ANSI_STDIO.
(cherry picked from commit 4ae13cfe0dedb8ddc3cf9ded8cd1ac09361b3bd1)
2024-05-07 16:03:09 +03:00
Lasse Collin
1456a9d943 xz: Windows: Don't (de)compress to special files like "con" or "nul".
Before this commit, the following writes "foo" to the
console and deletes the input file:

    echo foo | xz > con_xz
    xz --suffix=_xz --decompress con_xz

It cannot happen without --suffix because names like con.xz
are also special and so attempting to decompress con.xz
(or compress con to con.xz) will already fail when opening
the input file.

Similar thing is possible when compressing. The following
writes to "nul" and the input file "n" is deleted.

    echo foo | xz > n
    xz --suffix=ul n

Now xz checks if the destination is a special file before
continuing. DOS/DJGPP version had a check for this but
Windows (and OS/2) didn't.

(cherry picked from commit 660c8c29e57d30dbd5009ef1f0ec1bbe195ccef6)
2024-05-07 16:03:09 +03:00
Lasse Collin
a3de1b841e liblzma: Move a few __attribute__ uses in function declarations.
The API headers have many attributes but these were left
as is for now.

(cherry picked from commit e3478ae4f36cd06522a2fef023860893f068434d)
2024-05-07 15:58:21 +03:00
Lasse Collin
737318447a xz, xzdec, lzmainfo: Use tuklib_attr_noreturn.
For compatibility with C23's [[noreturn]], tuklib_attr_noreturn
must be at the beginning of declaration (before "extern" or
"static", and even before any GNU C's __attribute__).

This commit also moves all other function attributes to
the beginning of function declarations. "extern" is kept
at the beginning of a line so the attributes are listed on
separate lines before "extern" or "static".

(cherry picked from commit b71b8922ef3971e5ccffd1e213888d44abe21d11)
2024-05-07 15:58:20 +03:00
Lasse Collin
015e62b18d Remove incorrect uses of __attribute__((__malloc__)).
xrealloc() is obviously incorrect, modern GCC docs even
mention realloc() as an example where this attribute
cannot be used.

liblzma's lzma_alloc() and lzma_alloc_zero() would be
correct uses most of the time but custom allocators
may use a memory pool or otherwise hold the pointer
so aliasing issues could happen in theory.

The xstrdup() case likely was correct but I removed it anyway.
Now there are no __malloc__ attributes left in the code.
The allocations aren't in hot paths so this should make
no practical difference.

(cherry picked from commit 359e5c6cb128dab64ea6070d21d1c240f96cea6b)
2024-05-07 15:44:54 +03:00
Lasse Collin
df8daea282 xz: Fix a too relaxed assertion and remove uses of SSIZE_MAX.
SSIZE_MAX isn't readily available on MSVC. Removing it means
that there is one thing less to worry when porting to MSVC.

(cherry picked from commit ef71f83973a20cc28a3221f85681922026ea33f5)
2024-05-07 15:32:03 +03:00
Jia Tan
519896fc94 liblzma: Update assert in vli_ceil4().
The argument to vli_ceil4() should always guarantee the return value
is also a valid lzma_vli. Thus the highest three valid lzma_vli values
are invalid arguments. All uses of the function ensure this so the
assert is updated to match this.

(cherry picked from commit 773f1e8622cb1465df528cb16a749517650acd93)
2024-05-07 15:31:30 +03:00
Jia Tan
591ac56d42 liblzma: Add overflow check for Unpadded size in lzma_index_append().
This was not a security bug since there was no path to overflow
UINT64_MAX in lzma_index_append() or when it calls index_file_size().
The bug was discovered by a failing assert() in vli_ceil4() when called
from index_file_size() when unpadded_sum (the sum of the compressed size
of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX.

Previously, the unpadded_size parameter was checked to be not greater
than UNPADDED_SIZE_MAX, but no check was done once compressed_base was
added.

This could not have caused an integer overflow in index_file_size() when
called by lzma_index_append(). The calculation for file_size breaks down
into the sum of:

- Compressed base from all previous Streams
- 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and
  footer)
- stream_padding (can be set by lzma_index_stream_padding())
- Compressed base from the current Stream
- Unpadded size (parameter to lzma_index_append())

The sum of everything except for Unpadded size must be less than
LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions
that can set these values including lzma_index_stream_padding(),
lzma_index_append(), and lzma_index_cat(). The maximum value for
Unpadded size is enforced by lzma_index_append() to be less than or
equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since
LZMA_VLI_MAX is half of UINT64_MAX.

Thanks to Joona Kannisto for reporting this.

(cherry picked from commit 68bda971bb8b666a009331455fcedb4e18d837a4)
2024-05-07 15:31:30 +03:00
Jamaika1
ec0d5c99c3 mythread.h: Fix typo error in Vista threads mythread_once().
The "once_" variable was accidentally referred to as just "once". This
prevented building with Vista threads when
HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.

(cherry picked from commit c0c0cd4a483a672b66a13761583bc4f84d86d501)
2024-05-07 15:30:38 +03:00
Jia Tan
9d4bf2d06f liblzma: Prevent an empty translation unit in Windows builds.
To workaround Automake lacking Windows resource compiler support, an
empty source file is compiled to overwrite the resource files for static
library builds. Translation units without an external declaration are
not allowed by the C standard and result in a warning when used with
-Wempty-translation-unit (Clang) or -pedantic (GCC).

(cherry picked from commit 19899340cf74d98304f9f5b726c72e85c7017d72)
2024-05-07 15:28:35 +03:00
Lasse Collin
5a87d91321 liblzma: Tweak #if condition in memcmplen.h.
Maybe ICC always #defines _MSC_VER on Windows but now
it's very clear which code will get used.

(cherry picked from commit b406828a6dfd3caa4f77efe3ff3e3eea263eee62)
2024-05-07 15:28:35 +03:00
Lasse Collin
0c53f52657 liblzma: Omit unnecessary parenthesis in a preprocessor directive.
(cherry picked from commit ef4a07ad9434f81417395f6fe0bb331e027a703b)
2024-05-07 15:28:35 +03:00
Jia Tan
eede1df4af liblzma: Prevent warning for MSYS2 Windows build.
In lzma_memcmplen(), the <intrin.h> header file is only included if
_MSC_VER and _M_X64 are both defined but _BitScanForward64() was
previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but
not _MSC_VER so _BitScanForward64() was used without including
<intrin.h>.

Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as
expected.

(cherry picked from commit 64ee0caaea06654b28afaee850fb187a11bf9cb2)
2024-05-07 15:28:35 +03:00
Jia Tan
5f9bf81044 liblzma: Prevent uninitialzed warning in mt stream encoder.
This change only impacts the compiler warning since it was impossible
for the wait_abs struct in stream_encode_mt() to be used before it was
initialized since mythread_condtime_set() will always be called before
mythread_cond_timedwait().

Since the mythread.h code is different between the POSIX and
Windows versions, this warning was only present on Windows builds.

Thanks to Arthur S for reporting the warning and providing an initial
patch.

(cherry picked from commit 1155471651ad456c5f90aee6435931fae65682bf)
2024-05-07 15:23:51 +03:00
Jia Tan
774145adfd Bump version and soname for 5.2.12. 2023-05-04 21:55:11 +08:00
Lasse Collin
809a2fd698 tuklib_integer.h: Fix a recent copypaste error in Clang detection.
Wrong line was changed in 7062348bf35c1e4cbfee00ad9fffb4a21aa6eff7.
Also, this has >= instead of == since ints larger than 32 bits would
work too even if not relevant in practice.
2023-05-03 22:55:59 +03:00
Jia Tan
b02e74eb73 Windows: Include <intrin.h> when needed.
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.

GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
2023-05-03 22:33:10 +03:00
Jia Tan
8efb6ea63b tuklib_integer: Use __builtin_clz() with Clang.
Clang has support for __builtin_clz(), but previously Clang would
fallback to either the MSVC intrinsic or the regular C code. This was
discovered due to a bug where a new version of Clang required the
<intrin.h> header file in order to use the MSVC intrinsics.

Thanks to Anton Kochkov for notifying us about the bug.
2023-05-03 22:33:10 +03:00
Lasse Collin
3bd906f1f3 liblzma: Update project maintainers in lzma.h.
AUTHORS was updated earlier, lzma.h was simply forgotten.
2023-05-03 22:33:10 +03:00
Jia Tan
0ab5527c46 liblzma: Cleans up old commented out code. 2023-05-03 22:33:10 +03:00
Jia Tan
275e36013d Build: Removes redundant check for LZMA1 filter support. 2023-05-03 22:33:10 +03:00
Jia Tan
3e206e5c43 Bump version and soname for 5.2.11. 2023-03-18 22:25:59 +08:00
Lasse Collin
090ea9ddd3 Change a few HTTP URLs to HTTPS.
The xz man page timestamp was intentionally left unchanged.
2023-03-18 22:00:28 +08:00
Lasse Collin
09363bea46 liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-03-11 21:47:47 +02:00
Jia Tan
050c6dbf96 liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR.
LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put
documentation after a member.
2023-03-11 21:45:26 +02:00
Jia Tan
8daaac8e10 tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
2023-03-11 21:45:26 +02:00
Jia Tan
6c9a2c2e46 xz: Add missing comment for coder_set_compression_settings() 2023-03-11 21:45:26 +02:00
Jia Tan
ccbb991efa xz: Do not set compression settings with raw format in list mode.
Calling coder_set_compression_settings() in list mode with verbose mode
on caused the filter chain and memory requirements to print. This was
unnecessary since the command results in an error and not consistent
with other formats like lzma and alone.
2023-03-11 21:45:26 +02:00
Lasse Collin
6df383be4a xz: Use ssize_t for the to-be-ignored return value from write(fd, ptr, 1).
It makes no difference here as the return value fits into an int
too and it then gets ignored but this looks better.
2023-03-11 21:45:26 +02:00