Commit Graph

22 Commits

Author SHA1 Message Date
Lasse Collin 30e95bb44c liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-02-23 20:41:22 +02:00
Lasse Collin 1c9a5786d2 liblzma: Make Block decoder catch certain types of errors better.
Now it limits the input and output buffer sizes that are
passed to a raw decoder. This way there's no need to check
if the sizes can grow too big or overflow when updating
Compressed Size and Uncompressed Size counts. This also means
that a corrupt file cannot cause the raw decoder to process
useless extra input or output that would exceed the size info
in Block Header (and thus cause LZMA_DATA_ERROR anyway).

More importantly, now the size information is verified more
carefully in case raw decoder returns LZMA_OK. This doesn't
really matter with the current single-threaded .xz decoder
as the errors would be detected slightly later anyway. But
this helps avoiding corner cases in the upcoming threaded
decompressor, and it might help other Block decoder uses
outside liblzma too.

The test files bad-1-lzma2-{9,10,11}.xz test these conditions.
With the single-threaded .xz decoder the only difference is
that LZMA_DATA_ERROR is detected in a difference place now.
2022-02-20 20:36:27 +02:00
Lasse Collin d4a0462abe liblzma: Avoid multiple definitions of lzma_coder structures.
Only one definition was visible in a translation unit.
It avoided a few casts and temp variables but seems that
this hack doesn't work with link-time optimizations in compilers
as it's not C99/C11 compliant.

Fixes:
http://www.mail-archive.com/xz-devel@tukaani.org/msg00279.html
2016-11-21 20:24:50 +02:00
Lasse Collin 0e0f34b8e4 liblzma: Add support for lzma_block.ignore_check.
Note that this slightly changes how lzma_block_header_decode()
has been documented. Earlier it said that the .version is set
to the lowest required value, but now it says that the .version
field is kept unchanged if possible. In practice this doesn't
affect any old code, because before this commit the only
possible .version was 0.
2014-08-05 22:03:30 +03:00
Lasse Collin 3778db1be5 liblzma: Make the use of lzma_allocator const-correct.
There is a tiny risk of causing breakage: If an application
assigns lzma_stream.allocator to a non-const pointer, such
code won't compile anymore. I don't know why anyone would do
such a thing though, so in practice this shouldn't cause trouble.

Thanks to Jan Kratochvil for the patch.
2012-07-17 18:19:59 +03:00
Lasse Collin 083c23c680 Make the raw value of the Check field available to applications
via lzma_block structure.

This changes ABI but not doesn't break API.
2009-05-26 14:48:48 +03:00
Lasse Collin 21c6b94373 Fixed a crash in liblzma.
liblzma tries to avoid useless free()/malloc() pairs in
initialization when multiple files are handled using the
same lzma_stream. This didn't work with filter chains
due to comparison of wrong pointers in lzma_next_coder_init(),
making liblzma think that no memory reallocation is needed
even when it actually is.

Easy way to trigger this bug is to decompress two files with
a single xz command. The first file should have e.g. x86+LZMA2
as the filter chain, and the second file just LZMA2.
2009-04-28 23:08:32 +03:00
Lasse Collin 02ddf09bc3 Put the interesting parts of XZ Utils into the public domain.
Some minor documentation cleanups were made at the same time.
2009-04-13 11:27:40 +03:00
Lasse Collin 22a0c6dd94 Modify LZMA_API macro so that it works on Windows with
other compilers than MinGW. This may hurt readability
of the API headers slightly, but I don't know any
better way to do this.
2009-02-02 20:14:03 +02:00
Lasse Collin 2c5fe958e4 Fix a dumb bug in Block decoder, which made it return
LZMA_DATA_ERROR with valid data. The bug was added in
e114502b2b.
2009-01-25 01:35:56 +02:00
Lasse Collin e33194e79d Bunch of liblzma tweaks, including some API changes.
The API and ABI should now be very close to stable,
although the code behind it isn't yet.
2008-12-27 19:27:49 +02:00
Lasse Collin e114502b2b Oh well, big messy commit again. Some highlights:
- Updated to the latest, probably final file format version.
  - Command line tool reworked to not use threads anymore.
    Threading will probably go into liblzma anyway.
  - Memory usage limit is now about 30 % for uncompression
    and about 90 % for compression.
  - Progress indicator with --verbose
  - Simplified --help and full --long-help
  - Upgraded to the last LGPLv2.1+ getopt_long from gnulib.
  - Some bug fixes
2008-11-19 20:46:52 +02:00
Lasse Collin 13a74b78e3 Renamed constants:
- LZMA_VLI_VALUE_MAX -> LZMA_VLI_MAX
  - LZMA_VLI_VALUE_UNKNOWN -> LZMA_VLI_UNKNOWN
  - LZMA_HEADER_ERRRO -> LZMA_OPTIONS_ERROR
2008-09-13 12:10:43 +03:00
Lasse Collin 2ba01bfa75 Cleaned up Block encoder and moved the no longer shared
code from block_private.h to block_decoder.c. Now the Block
encoder doesn't need compressed_size and uncompressed_size
from lzma_block structure to be initialized.
2008-09-10 00:27:02 +03:00
Lasse Collin 2496aee8a7 Don't allow LZMA_SYNC_FLUSH with decoders anymore. There's
simply nothing that would use it. Allow LZMA_FINISH to the
decoders, which will usually ignore it (auto decoder and
Stream decoder being exceptions).
2008-09-04 10:39:15 +03:00
Lasse Collin 3b34851de1 Sort of garbage collection commit. :-| Many things are still
broken. API has changed a lot and it will still change a
little more here and there. The command line tool doesn't
have all the required changes to reflect the API changes, so
it's easy to get "internal error" or trigger assertions.
2008-08-28 22:53:15 +03:00
Lasse Collin 7d17818cec Update the code to mostly match the new simpler file format
specification. Simplify things by removing most of the
support for known uncompressed size in most places.
There are some miscellaneous changes here and there too.

The API of liblzma has got many changes and still some
more will be done soon. While most of the code has been
updated, some things are not fixed (the command line tool
will choke with invalid filter chain, if nothing else).

Subblock filter is somewhat broken for now. It will be
updated once the encoded format of the Subblock filter
has been decided.
2008-06-18 18:02:10 +03:00
Lasse Collin 4441e00418 Combine lzma_options_block validation needed by both Block
encoder and decoder, and put the shared things to
block_private.h. Improved the checks a little so that
they may detect too big Compressed Size at initialization
time if lzma_options_block.total_size or .total_limit is
known.

Allow encoding and decoding Blocks with combinations of
fields that are not allowed by the file format specification.
Doing this requires that the application passes such a
combination in lzma_options_lzma; liblzma doesn't do that,
but it's not impossible that someone could find them useful
in some custom file format.
2008-01-25 23:12:36 +02:00
Lasse Collin 379fbbe84d Take advantage of return_if_error() in block_decoder.c. 2008-01-08 23:11:59 +02:00
Lasse Collin 7054c5f588 Fix decoding of Blocks that have only Block Header. 2008-01-08 22:58:42 +02:00
Lasse Collin 07ac881779 Take advantage of return_if_error() macro in more places.
Cleaned Subblock filter's initialization code too.
2007-12-09 17:06:45 +02:00
Lasse Collin 5d018dc035 Imported to git. 2007-12-09 00:42:33 +02:00