Commit Graph

2334 Commits

Author SHA1 Message Date
Jia Tan fd90e2f4c2 liblzma: Remove note from lzma_options_bcj about the ARM64 exception.
This was left in by mistake since an early version of the ARM64 filter
used a different struct for its options.
2023-03-17 01:42:28 +08:00
Jia Tan 4f50763b98 CI: Add doxygen as a dependency.
Autogen now requires --no-doxygen or having doxygen installed to run
without errors.
2023-03-17 01:42:28 +08:00
Lasse Collin f68f4b27f6 COPYING: Add a note about the included Doxygen-generated HTML. 2023-03-17 01:42:28 +08:00
Jia Tan 8979308528 Doc: Update PACKAGERS with details about liblzma API docs install. 2023-03-17 01:42:28 +08:00
Jia Tan 55ba6e9300 liblzma: Add set lzma.h as the main page for Doxygen documentation.
The \mainpage command is used in the first block of comments in lzma.h.
This changes the previously nearly empty index.html to use the first
comment block in lzma.h for its contents.

lzma.h is no longer documented separately, but this is for the better
since lzma.h only defined a few macros that users do not need to use.
The individual API header files all have a disclaimer that they should
not be #included directly, so there should be no confusion on the fact
that lzma.h should be the only header used by applications.

Additionally, the note "See ../lzma.h for information about liblzma as
a whole." was removed since lzma.h is now the main page of the
generated HTML and does not have its own page anymore. So it would be
confusing in the HTML version and was only a "nice to have" when
browsing the source files.
2023-03-17 01:42:28 +08:00
Jia Tan 16f2125559 Build: Generate doxygen documentation in autogen.sh.
Another command line option (--no-doxygen) was added to disable
creating the doxygen documenation in cases where it not wanted or
if the doxygen tool is not installed.
2023-03-17 01:42:28 +08:00
Jia Tan 1321852a3b Build: Create doxygen/update-doxygen script.
This is a helper script to generate the Doxygen documentation. It can be
run in 'liblzma' or 'internal' mode by setting the first argument. It
will default to 'liblzma' mode and only generate documentation for the
liblzma API header files.

The helper script will be run during the custom mydist hook when we
create releases. This hook already alters the source directory, so its
fine to do it here too. This way, we can include the Doxygen generated
files in the distrubtion and when installing.

In 'liblzma' mode, the JavaScript is stripped from the .html files and
the .js files are removed. This avoids license hassle from jQuery and
other libraries that Doxygen 1.9.6 puts into jquery.js in minified form.
2023-03-17 01:42:28 +08:00
Jia Tan b1216a7772 Build: Install Doxygen docs and include in distribution if generated.
Added a install-data-local target to install the Doxygen documentation
only when it has been generated. In order to correctly remove the docs,
a corresponding uninstall-local target was added.

If the doxygen docs exist in the source tree, they will also be included
in the distribution now too.
2023-03-17 01:42:28 +08:00
Lasse Collin c97d12f300 Doxygen: Refactor Doxyfile.in to doxygen/Doxyfile.
Instead of having Doxyfile.in configured by Autoconf, the Doxyfile
can have the tags that need to be configured piped into the doxygen
command through stdin with the overrides after Doxyfile's contents.

Going forward, the documentation should be generated in two different
modes: liblzma or internal.

liblzma is useful for most users. It is the documentation for just
the liblzma API header files. This is the default.

internal is for people who want to understand how xz and liblzma work.
It might be useful for people who want to contribute to the project.
2023-03-17 01:42:28 +08:00
Jia Tan 1b7661faa4 Tests: Remove unused macros and functions. 2023-03-13 20:49:53 +08:00
Jia Tan af55191102 liblzma: Defines masks for return values from lzma_index_checks(). 2023-03-13 20:49:53 +08:00
Jia Tan 8f38cdd9ab Tests: Refactors existing lzma_index tests.
Converts the existing lzma_index tests into tuktests and covers every
API function from index.h except for lzma_file_info_decoder, which can
be tested in the future.
2023-03-13 20:49:53 +08:00
Lasse Collin 717aa3651c xz: Simplify the error-label in Capsicum sandbox code.
Also remove unneeded "sandbox_allowed = false;" as this code
will never be run more than once (making it work with multiple
input files isn't trivial).
2023-03-11 18:46:45 +02:00
Lasse Collin a0eecc235d xz: Make Capsicum sandbox more strict with stdin and stdout. 2023-03-08 23:22:15 +08:00
Jia Tan 916448d624 Revert: "Add warning if Capsicum sandbox system calls are unsupported."
The warning causes the exit status to be 2, so this will cause problems
for many scripted use cases for xz. The sandbox usage is already very
limited already, so silently disabling this allows it to be more usable.
2023-03-08 23:22:11 +08:00
Jia Tan 01587dda2a xz: Fix -Wunused-label in io_sandbox_enter().
Thanks to Xin Li for recommending the fix.
2023-03-07 20:02:22 +08:00
Jia Tan 5fb9367866 xz: Add warning if Capsicum sandbox system calls are unsupported.
The warning is only used when errno == ENOSYS. Otherwise, xz still
issues a fatal error.
2023-03-06 21:37:45 +08:00
Jia Tan 61ee82cb12 xz: Skip Capsicum sandbox system calls when they are unsupported.
If a system has the Capsicum header files but does not actually
implement the system calls, then this would render xz unusable. Instead,
we can check if errno == ENOSYS and not issue a fatal error.
2023-03-06 21:27:53 +08:00
Jia Tan f070722b57 xz: Reorder cap_enter() to beginning of capsicum sandbox code.
cap_enter() puts the process into the sandbox. If later calls to
cap_rights_limit() fail, then the process can still have some extra
protections.
2023-03-06 21:08:26 +08:00
Jia Tan f1ab1f6b33 liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h.
lzma_lzma_preset() does not guarentee that the lzma_options_lzma are
usable in an encoder even if it returns false (success). If liblzma
is built with default configurations, then the options will always be
usable. However if the match finders hc3, hc4, or bt4 are disabled, then
the options may not be usable depending on the preset level requested.

The documentation was updated to reflect this complexity, since this
behavior was unclear before.
2023-03-01 21:42:31 +08:00
Lasse Collin 4b7fb3bf41 CMake: Require that the C compiler supports C99 or a newer standard.
Thanks to autoantwort for reporting the issue and suggesting
a different patch:
https://github.com/tukaani-project/xz/pull/42
2023-02-27 18:38:35 +02:00
Jia Tan 9aa7fdeb04 Tests: Small tweak to test-vli.c.
The static global variables can be disabled if encoders and decoders
are not built. If they are not disabled and -Werror is used, it will
cause an usused warning as an error.
2023-02-24 21:11:18 +08:00
Jia Tan 3cf72c4bcb liblzma: Replace '\n' -> newline in filter.h documentation.
The '\n' renders as a newline when the comments are converted to html
by Doxygen.
2023-02-24 21:09:39 +08:00
Jia Tan 002006be62 liblzma: Shorten return description for two functions in filter.h.
Shorten the description for lzma_raw_encoder_memusage() and
lzma_raw_decoder_memusage().
2023-02-24 21:09:39 +08:00
Jia Tan 463d9359b8 liblzma: Reword a few lines in filter.h 2023-02-24 21:09:39 +08:00
Jia Tan 01441df92c liblzma: Improve documentation in filter.h.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.

Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
2023-02-24 21:09:39 +08:00
Lasse Collin 805b45cd60 Update THANKS. 2023-02-23 20:46:16 +02:00
Lasse Collin 30e95bb44c liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-02-23 20:41:22 +02:00
Jia Tan fa9065fac5 liblzma: Adjust container.h for consistency with filter.h. 2023-02-23 20:27:59 +08:00
Jia Tan 00a721b63d liblzma: Fix small typos and reword a few things in filter.h. 2023-02-23 20:27:59 +08:00
Jia Tan 5b1c171d4f liblzma: Convert list of flags in lzma_mt to bulleted list. 2023-02-23 20:27:59 +08:00
Jia Tan dbd47622eb liblzma: Fix typo in documentation in container.h
lzma_microlzma_decoder -> lzma_microlzma_encoder
2023-02-23 20:27:59 +08:00
Jia Tan 14cd30806d liblzma: Improve documentation for container.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
2023-02-23 20:27:59 +08:00
Jia Tan c9c8bfae35 CMake: Add LZIP decoder test to list of tests. 2023-02-22 21:10:28 +08:00
Lasse Collin b9f171dd00 Update THANKS. 2023-02-17 20:56:49 +02:00
Lasse Collin 2ee86d20e4 Build: Use only the generic symbol versioning on MicroBlaze.
On MicroBlaze, GCC 12 is broken in sense that
__has_attribute(__symver__) returns true but it still doesn't
support the __symver__ attribute even though the platform is ELF
and symbol versioning is supported if using the traditional
__asm__(".symver ...") method. Avoiding the traditional method is
good because it breaks LTO (-flto) builds with GCC.

See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766

For now the only extra symbols in liblzma_linux.map are the
compatibility symbols with the patch that spread from RHEL/CentOS 7.
These require the use of __symver__ attribute or __asm__(".symver ...")
in the C code. Compatibility with the patch from CentOS 7 doesn't
seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze
instead. It doesn't require anything special in the C code and thus
no LTO issues either.

An alternative would be to detect support for __symver__
attribute in configure.ac and CMakeLists.txt and fall back
to __asm__(".symver ...") but then LTO would be silently broken
on MicroBlaze. It sounds likely that MicroBlaze is a special
case so let's treat it as a such because that is simpler. If
a similar issue exists on some other platform too then hopefully
someone will report it and this can be reconsidered.

(This doesn't do the same fix in CMakeLists.txt. Perhaps it should
but perhaps CMake build of liblzma doesn't matter much on MicroBlaze.
The problem breaks the build so it's easy to notice and can be fixed
later.)

Thanks to Vincent Fazio for reporting the problem and proposing
a patch (in the end that solution wasn't used):
https://github.com/tukaani-project/xz/pull/32
2023-02-17 20:48:28 +02:00
Lasse Collin d831072cce liblzma: Very minor API doc tweaks.
Use "member" to refer to struct members as that's the term used
by the C standard.

Use lzma_options_delta.dist and such in docs so that in Doxygen's
HTML output they will link to the doc of the struct member.

Clean up a few trailing white spaces too.
2023-02-16 21:09:00 +02:00
Jia Tan f029daea39 liblzma: Adjust spacing in doc headers in bcj.h. 2023-02-17 00:54:33 +08:00
Jia Tan a5de68bac2 liblzma: Adjust documentation in bcj.h for consistent style. 2023-02-17 00:49:47 +08:00
Jia Tan efa498c13b liblzma: Rename field => member in documentation.
Also adjusted preset value => preset level.
2023-02-17 00:49:47 +08:00
Lasse Collin 718b22a6c5 liblzma: Silence a warning from MSVC.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.

Thanks to Nathan Moinvaziri for reporting this.
2023-02-16 17:59:50 +02:00
Jia Tan 87c53553fa liblzma: Improve documentation for stream_flags.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.

A few small things were reworded and long sentences broken up.
2023-02-16 21:04:54 +08:00
Jia Tan 13d99e75a5 liblzma: Improve documentation in lzma12.h.
All functions now explicitly specify parameter and return values.
2023-02-15 22:21:44 +08:00
Jia Tan 43ec344c86 liblzma: Improve documentation in check.h.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
2023-02-15 00:59:16 +08:00
Jia Tan 9c71db4e88 liblzma: Improve documentation in index.h
All functions now explicitly specify parameter and return values.
2023-02-15 00:20:44 +08:00
Jia Tan 421f2f2e16 liblzma: Reword a comment in index.h. 2023-02-15 00:20:44 +08:00
Jia Tan b675394849 liblzma: Omit lzma_index_iter's internal field from Doxygen docs.
Add \private above this field and its sub-fields since it is not meant
to be modified by users.
2023-02-15 00:20:44 +08:00
Jia Tan 0c9e4fc2ad liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR.
LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put
documentation after a member.
2023-02-14 20:41:05 +08:00
Jia Tan 816fec125a liblzma: Improve documentation for base.h.
Standardizing each function to always specify params and return values.
Also fixed a small grammar mistake.
2023-02-14 20:41:05 +08:00
Jia Tan 862dacef1a liblzma: Add one more missing [out] annotation in vli.h 2023-02-14 00:12:34 +08:00