The initial commit 5d018dc035
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.
Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.
(cherry picked from commit 76946dc433)
The first member of lzma_lz_encoder doesn't necessarily need to be set
to NULL since it will always be set before anything tries to use it.
However the function pointer members must be set to NULL since other
functions rely on this NULL value to determine if this behavior is
supported or not.
This fixes a somewhat serious bug, where the options_update() and
set_out_limit() function pointers are not set to NULL. This seems to
have been forgotten since these function pointers were added many years
after the original two (code() and end()).
The problem is that by not setting this to NULL we are relying on the
memory allocation to zero things out if lzma_filters_update() is called
on a LZMA1 encoder. The function pointer for set_out_limit() is less
serious because there is not an API function that could call this in an
incorrect way. set_out_limit() is only called by the MicroLZMA encoder,
which must use LZMA1 where set_out_limit() is always set. Its currently
not possible to call set_out_limit() on an LZMA2 encoder at this time.
So calling lzma_filters_update() on an LZMA1 encoder had undefined
behavior since its possible that memory could be manipulated so the
options_update member pointed to a different instruction sequence.
This is unlikely to be a bug in an existing application since it relies
on calling lzma_filters_update() on an LZMA1 encoder in the first place.
For instance, it does not affect xz because lzma_filters_update() can
only be used when encoding to the .xz format.
lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the
parameter name instead of "filters" (used by the declaration). "filters"
is more clear since the parameter represents the list of filters passed
to the raw encoder, each of which contains filter options.
lzma_encoder_init() did not check for NULL options, but
lzma2_encoder_init() did. This is more of a code style improvement than
anything else to help make lzma_encoder_init() and lzma2_encoder_init()
more similar.
The new is_tty() will report if a file descriptor is a terminal or not.
On POSIX systems, it is a wrapper around isatty(). However, the native
Windows implementation of isatty() will return true for all character
devices, not just terminals. So is_tty() has a special case for Windows
so it can use alternative Windows API functions to determine if a file
descriptor is a terminal.
This fixes a bug with MSVC and MinGW-w64 builds that refused to read from
or write to non-terminal character devices because xz thought it was a
terminal. For instance:
xz foo -c > /dev/null
would fail because /dev/null was assumed to be a terminal.
The suffix refactor done in 99575947a5
had a small regression where raw format compression to standard out
failed if a suffix was not set. In this case, setting the suffix did
not make sense since a file is not created.
Now, xz should only fail when a suffix is not provided when it is
actually needed.
For instance:
echo "foo" | xz --format=raw --lzma2 | wc -c
does not need a suffix check since it creates no files. But:
xz --format=raw --lzma2 --suffix=.bar foo
Needs the suffix to be set since it must create foo.bar.
The macro lzma_attr_visibility_hidden has to be defined to make
fastpos.h usable. The visibility attribute is irrelevant to
fastpos_tablegen.c so simply #define the macro to an empty value.
fastpos_tablegen.c is never built by the included build systems
and so the problem wasn't noticed earlier. It's just a standalone
program for generating fastpos_table.c.
Fixes: https://github.com/tukaani-project/xz/pull/69
Thanks to GitHub user Jamaika1.
In ELF shared libs:
-fvisibility=hidden affects definitions of symbols but not
declarations.[*] This doesn't affect direct calls to functions
inside liblzma as a linker can replace a call to lzma_foo@plt
with a call directly to lzma_foo when -fvisibility=hidden is used.
[*] It has to be like this because otherwise every installed
header file would need to explictly set the symbol visibility
to default.
When accessing extern variables that aren't defined in the
same translation unit, compiler assumes that the variable has
the default visibility and thus indirection is needed. Unlike
function calls, linker cannot optimize this.
Using __attribute__((__visibility__("hidden"))) with the extern
variable declarations tells the compiler that indirection isn't
needed because the definition is in the same shared library.
About 15+ years ago, someone told me that it would be good if
the CRC tables would be defined in the same translation unit
as the C code of the CRC functions. While I understood that it
could help a tiny amount, I didn't want to change the code because
a separate translation unit for the CRC tables was needed for the
x86 assembly code anyway. But when visibility attributes are
supported, simply marking the extern declaration with the
hidden attribute will get identical result. When there are only
a few affected variables, this is trivial to do. I wish I had
understood this back then already.
MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project
at <https://osdn.net/projects/mingw/>) has GCC 9.2.0 as the
most recent GCC package (released 2021-02-02). The project might
still be alive but majority of people have switched to MinGW-w64.
Thus it seems clearer to refer to MinGW-w64 in our API headers too.
Building with MinGW is likely to still work but I haven't tested it
in the recent years.
It properly adds -DLZMA_API_STATIC when compiling code that
will be linked against static liblzma. Having it there on
systems other than Windows does no harm.
See: https://www.msys2.org/docs/pkgconfig/
In XZ Utils context this doesn't matter much because
unaligned reads and writes aren't used in hot code
when TUKLIB_FAST_UNALIGNED_ACCESS isn't #defined.
When the generic fast crc64 method is used, then we omit
lzma_crc64_table[][].
The C standards don't allow an empty translation unit which can be
avoided by declaring something, without exporting any symbols.
Before this commit, the following writes "foo" to the
console and deletes the input file:
echo foo | xz > con_xz
xz --suffix=_xz --decompress con_xz
It cannot happen without --suffix because names like con.xz
are also special and so attempting to decompress con.xz
(or compress con to con.xz) will already fail when opening
the input file.
Similar thing is possible when compressing. The following
writes to "nul" and the input file "n" is deleted.
echo foo | xz > n
xz --suffix=ul n
Now xz checks if the destination is a special file before
continuing. DOS/DJGPP version had a check for this but
Windows (and OS/2) didn't.
For compatibility with C23's [[noreturn]], tuklib_attr_noreturn
must be at the beginning of declaration (before "extern" or
"static", and even before any GNU C's __attribute__).
This commit also moves all other function attributes to
the beginning of function declarations. "extern" is kept
at the beginning of a line so the attributes are listed on
separate lines before "extern" or "static".
xrealloc() is obviously incorrect, modern GCC docs even
mention realloc() as an example where this attribute
cannot be used.
liblzma's lzma_alloc() and lzma_alloc_zero() would be
correct uses most of the time but custom allocators
may use a memory pool or otherwise hold the pointer
so aliasing issues could happen in theory.
The xstrdup() case likely was correct but I removed it anyway.
Now there are no __malloc__ attributes left in the code.
The allocations aren't in hot paths so this should make
no practical difference.
Now the two variations of the format strings are created with
a macro, and the whole detection code can be easily disabled
on platforms where thousand separator formatting is known to
not work (MSVC has no support, and on DJGPP 2.05 it can have
problems in some cases).
The argument to vli_ceil4() should always guarantee the return value
is also a valid lzma_vli. Thus the highest three valid lzma_vli values
are invalid arguments. All uses of the function ensure this so the
assert is updated to match this.
This was not a security bug since there was no path to overflow
UINT64_MAX in lzma_index_append() or when it calls index_file_size().
The bug was discovered by a failing assert() in vli_ceil4() when called
from index_file_size() when unpadded_sum (the sum of the compressed size
of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX.
Previously, the unpadded_size parameter was checked to be not greater
than UNPADDED_SIZE_MAX, but no check was done once compressed_base was
added.
This could not have caused an integer overflow in index_file_size() when
called by lzma_index_append(). The calculation for file_size breaks down
into the sum of:
- Compressed base from all previous Streams
- 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and
footer)
- stream_padding (can be set by lzma_index_stream_padding())
- Compressed base from the current Stream
- Unpadded size (parameter to lzma_index_append())
The sum of everything except for Unpadded size must be less than
LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions
that can set these values including lzma_index_stream_padding(),
lzma_index_append(), and lzma_index_cat(). The maximum value for
Unpadded size is enforced by lzma_index_append() to be less than or
equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since
LZMA_VLI_MAX is half of UINT64_MAX.
Thanks to Joona Kannisto for reporting this.
The "once_" variable was accidentally referred to as just "once". This
prevented building with Vista threads when
HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.
signal.h in WASI SDK doesn't currently provide sigprocmask()
or sigset_t. liblzma doesn't need them so this change makes
liblzma and xzdec build against WASI SDK. xz doesn't build yet
and the tests don't either as tuktest needs setjmp() which
isn't (yet?) implemented in WASI SDK.
Closes: https://github.com/tukaani-project/xz/pull/57
See also: https://github.com/tukaani-project/xz/pull/56
(The original commit was edited a little by Lasse Collin.)
To workaround Automake lacking Windows resource compiler support, an
empty source file is compiled to overwrite the resource files for static
library builds. Translation units without an external declaration are
not allowed by the C standard and result in a warning when used with
-Wempty-translation-unit (Clang) or -pedantic (GCC).
Reword "options required" to "options read". The previous wording
may have suggested that the options listed were all required when
the filters are used for encoding or decoding. Now it should be
more clear that the options listed are the ones relevant for
encoding or decoding.