Commit Graph

1710 Commits

Author SHA1 Message Date
Lasse Collin dfc9a54082 liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-03-07 23:24:15 +08:00
Jia Tan f6dce49cb6 liblzma: Adjust container.h for consistency with filter.h. 2023-03-07 23:24:09 +08:00
Jia Tan 173d240bb4 liblzma: Fix small typos and reword a few things in filter.h. 2023-03-07 23:24:05 +08:00
Jia Tan 17797bacde liblzma: Convert list of flags in lzma_mt to bulleted list. 2023-03-07 23:24:00 +08:00
Jia Tan 37da0e7271 liblzma: Fix typo in documentation in container.h
lzma_microlzma_decoder -> lzma_microlzma_encoder
2023-03-07 23:23:55 +08:00
Jia Tan b8331077c6 liblzma: Improve documentation for container.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
2023-03-07 23:23:51 +08:00
Jia Tan b9a3511bb6 CMake: Add LZIP decoder test to list of tests. 2023-03-07 23:23:41 +08:00
Lasse Collin cd82ef2fb4 Update THANKS. 2023-03-07 23:23:34 +08:00
Lasse Collin 076e911ba2 Build: Use only the generic symbol versioning on MicroBlaze.
On MicroBlaze, GCC 12 is broken in sense that
__has_attribute(__symver__) returns true but it still doesn't
support the __symver__ attribute even though the platform is ELF
and symbol versioning is supported if using the traditional
__asm__(".symver ...") method. Avoiding the traditional method is
good because it breaks LTO (-flto) builds with GCC.

See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766

For now the only extra symbols in liblzma_linux.map are the
compatibility symbols with the patch that spread from RHEL/CentOS 7.
These require the use of __symver__ attribute or __asm__(".symver ...")
in the C code. Compatibility with the patch from CentOS 7 doesn't
seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze
instead. It doesn't require anything special in the C code and thus
no LTO issues either.

An alternative would be to detect support for __symver__
attribute in configure.ac and CMakeLists.txt and fall back
to __asm__(".symver ...") but then LTO would be silently broken
on MicroBlaze. It sounds likely that MicroBlaze is a special
case so let's treat it as a such because that is simpler. If
a similar issue exists on some other platform too then hopefully
someone will report it and this can be reconsidered.

(This doesn't do the same fix in CMakeLists.txt. Perhaps it should
but perhaps CMake build of liblzma doesn't matter much on MicroBlaze.
The problem breaks the build so it's easy to notice and can be fixed
later.)

Thanks to Vincent Fazio for reporting the problem and proposing
a patch (in the end that solution wasn't used):
https://github.com/tukaani-project/xz/pull/32
2023-03-07 23:23:29 +08:00
Lasse Collin bc34e5ac99 liblzma: Very minor API doc tweaks.
Use "member" to refer to struct members as that's the term used
by the C standard.

Use lzma_options_delta.dist and such in docs so that in Doxygen's
HTML output they will link to the doc of the struct member.

Clean up a few trailing white spaces too.
2023-03-07 23:23:19 +08:00
Jia Tan d31fbd28be liblzma: Adjust spacing in doc headers in bcj.h. 2023-03-07 23:23:04 +08:00
Jia Tan 701e9be6be liblzma: Adjust documentation in bcj.h for consistent style. 2023-03-07 23:22:57 +08:00
Jia Tan 762c4d0b62 liblzma: Rename field => member in documentation.
Also adjusted preset value => preset level.
2023-03-07 23:22:46 +08:00
Lasse Collin 0ce1db0223 liblzma: Silence a warning from MSVC.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.

Thanks to Nathan Moinvaziri for reporting this.
2023-03-07 23:22:21 +08:00
Jia Tan d83da006b3 liblzma: Improve documentation for stream_flags.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.

A few small things were reworded and long sentences broken up.
2023-03-07 23:21:47 +08:00
Jia Tan 2796bb4736 liblzma: Improve documentation in lzma12.h.
All functions now explicitly specify parameter and return values.
2023-02-15 22:48:21 +08:00
Jia Tan ebebaa8d93 liblzma: Improve documentation in check.h.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
2023-02-15 22:48:07 +08:00
Jia Tan 765fa2865a liblzma: Improve documentation in index.h
All functions now explicitly specify parameter and return values.
2023-02-15 22:47:59 +08:00
Jia Tan 918e208af5 liblzma: Reword a comment in index.h. 2023-02-15 22:47:55 +08:00
Jia Tan 1f157d214b liblzma: Omit lzma_index_iter's internal field from Doxygen docs.
Add \private above this field and its sub-fields since it is not meant
to be modified by users.
2023-02-15 22:47:36 +08:00
Jia Tan 28757fa46d liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR.
LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put
documentation after a member.
2023-02-15 22:47:29 +08:00
Jia Tan 135d5a1a65 liblzma: Improve documentation for base.h.
Standardizing each function to always specify params and return values.
Also fixed a small grammar mistake.
2023-02-15 22:46:41 +08:00
Jia Tan 2287d56683 liblzma: Minor improvements to vli.h.
Added [out] annotations to parameters that are pointers and can have
their value changed. Also added a clarification to lzma_vli_is_valid.
2023-02-15 22:46:25 +08:00
Jia Tan 7124b8a16a liblzma: Add comments for macros in delta.h.
Document LZMA_DELTA_DIST_MIN and LZMA_DELTA_DIST_MAX for completeness
and to avoid Doxygen warnings.
2023-02-15 22:45:57 +08:00
Jia Tan 59c7bb8931 liblzma: Improve documentation in index_hash.h.
All functions now explicitly specify parameter and return values.
Also reworded the description of lzma_index_hash_init() for readability.
2023-02-15 22:45:51 +08:00
Jia Tan e970c28ac3 liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.
The bug is only a problem in applications that do not properly terminate
the filters[] array with LZMA_VLI_UNKNOWN or have more than
LZMA_FILTERS_MAX filters. This bug does not affect xz.
2023-02-03 21:43:01 +08:00
Jia Tan 85e01266a9 Tests: Create test_filter_str.c.
Tests lzma_str_to_filters(), lzma_str_from_filters(), and
lzma_str_list_filters() API functions.
2023-02-03 21:42:48 +08:00
Jia Tan 3fa0f3ba12 liblzma: Fix typos in comments in string_conversion.c. 2023-02-03 21:42:40 +08:00
Jia Tan 32dbe045d7 liblzma: Clarify block encoder and decoder documentation.
Added a few sentences to the description for lzma_block_encoder() and
lzma_block_decoder() to highlight that the Block Header must be coded
before calling these functions.
2023-02-03 21:42:35 +08:00
Jia Tan ccf12acbfa Update lzma_block documentation for lzma_block_uncomp_encode(). 2023-02-03 21:42:30 +08:00
Jia Tan 6a0b168dd9 liblzma: Minor edits to lzma_block header_size documentation. 2023-02-03 21:42:27 +08:00
Jia Tan 84ce36f90e liblzma: Enumerate functions that read version in lzma_block. 2023-02-03 21:42:24 +08:00
Jia Tan d662077468 liblzma: Clarify comment in block.h. 2023-02-03 21:42:19 +08:00
Jia Tan 880adb5aa2 liblzma: Improve documentation for block.h.
Standardizing each function to always specify params and return values.
Output pointer parameters are also marked with doxygen style [out] to
make it clear. Any note sections were also moved above the parameter and
return sections for consistency.
2023-02-03 21:42:14 +08:00
Jia Tan b5b1b1f061 liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION.
The flag description for LZMA_STR_NO_VALIDATION was previously confusing
about the treatment for filters than cannot be used with .xz format
(lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that
LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.
2023-02-03 21:42:07 +08:00
Jia Tan e904e778b8 Translations: Add Brazilian Portuguese translation of man pages.
Thanks to Rafael Fontenelle.
2023-02-03 21:39:59 +08:00
Jia Tan e9c47e79c9 liblzma: Fix documentation in filter.h for lzma_str_to_filters()
The previous documentation for lzma_str_to_filters() was technically
correct, but misleading. lzma_str_to_filters() returns NULL on success,
which is in practice always defined to 0. This is the same value as
LZMA_OK, but lzma_str_to_filters() does not return lzma_ret so we should
be more clear.
2023-02-03 21:38:26 +08:00
Jia Tan 99575947a5 xz: Refactor duplicated check for custom suffix when using --format=raw 2023-02-03 21:38:26 +08:00
Jia Tan 76dec92fcc liblzma: Set documentation on all reserved fields to private.
This prevents the reserved fields from being part of the generated
Doxygen documentation.
2023-02-03 21:38:26 +08:00
Jia Tan bd213d06eb liblzma: Highlight liblzma API headers should not be included directly.
This improves the generated Doxygen HTML files to better highlight
how to properly use the liblzma API header files.
2023-02-03 21:38:26 +08:00
Jia Tan 257dbff0ba tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
2023-02-03 21:38:13 +08:00
Jia Tan 720ad4a442 xz: Add missing comment for coder_set_compression_settings() 2023-02-03 21:11:32 +08:00
Jia Tan 88dc191634 xz: Do not set compression settings with raw format in list mode.
Calling coder_set_compression_settings() in list mode with verbose mode
on caused the filter chain and memory requirements to print. This was
unnecessary since the command results in an error and not consistent
with other formats like lzma and alone.
2023-02-03 21:11:11 +08:00
Jia Tan 039e0ab13e Translations: Update the Brazilian Portuguese translation. 2023-02-03 21:10:57 +08:00
Lasse Collin 718f7a60e7 Build: Omit -Wmissing-noreturn from the default warnings.
It's not that important. It can be annoying in builds that
disable many features since in those cases the tests programs
will correctly trigger this warning with Clang.
2023-02-03 21:10:47 +08:00
Lasse Collin 3ccedb0972 xz: Use ssize_t for the to-be-ignored return value from write(fd, ptr, 1).
It makes no difference here as the return value fits into an int
too and it then gets ignored but this looks better.
2023-02-03 21:10:42 +08:00
Lasse Collin 09fbd2f052 xz: Silence warnings from -Wsign-conversion in a 32-bit build. 2023-02-03 21:10:38 +08:00
Lasse Collin 683d3f178e liblzma: Silence another warning from -Wsign-conversion in a 32-bit build.
It doesn't warn on a 64-bit system because truncating
a ptrdiff_t (signed long) to uint32_t is diagnosed under
-Wconversion by GCC and -Wshorten-64-to-32 by Clang.
2023-02-03 21:10:30 +08:00
Lasse Collin 2b8062ef94 liblzma: Silence a warning from -Wsign-conversion in a 32-bit build. 2023-02-03 21:10:25 +08:00
Lasse Collin b16b9c0d22 Build: Make configure add more warning flags for GCC and Clang.
-Wstrict-aliasing was removed from the list since it is enabled
by -Wall already.

A normal build is clean with these on GNU/Linux x86-64 with
GCC 12.2.0 and Clang 14.0.6.
2023-02-03 21:10:19 +08:00