Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
This simplifies things a little. Building liblzma with VS2013 probably
still worked but building the command line tools was not supported.
Microsoft ended support for VS2013 on 2024-04.
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Thanks to Ilya Kurdyukov.
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Thanks to Sam James for general feedback.
Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112https://github.com/tukaani-project/xz/issues/122
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.
An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.
Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
The C code is from Christian Weisgerber, I merely reordered the OSes.
Then I added the build system checks without testing them.
Also thanks to Brad Smith who submitted a similar patch on GitHub
a few hours after Christian had sent his via email.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Closes: https://github.com/tukaani-project/xz/pull/125
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
uint64_to_str(totals.files, 0));
^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
It's presumably because of older gettext missing format attributes.
This is with `gcc (GCC) 7.3.0`.
This is closer to what it was before the --filtersX support was added,
just extended to support for scaling all filter chains. The method
before this commit was an extended version of the original too but
it was done in a more complex way for no clear reason. In case of
an error, the complex version printed fewer informative messages
(a good thing) but it's not a sigificant benefit.
In the limit is too low even for single-threaded mode, the required
amount of memory is now reported like in 5.4.x instead of like in
5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It
had been a FIXME in the old code but it's not clear what message
makes the most sense.
Fixes: 5f0c5a0438
The convention is that
lzma_filter filters[LZMA_FILTERS_MAX + 1];
contains the filters of a single filter chain.
It was so here as well before the commit
d6af7f3470.
It changes "filters" to a ten-element array of filter chains.
It's clearer to call this array-of-arrays "chains".
This also renames "filter_idx" to "chain_idx" which is used
as an index as in chains[chain_idx].
opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't
defined. Thus, when *encoding*, the message about *decoder* memory
usage is possible to show only when both encoder and decoder have
been built.
Since the message is shown only at V_DEBUG, skip the memusage
calculation if verbosity level isn't high enough.
Fixes: 5f0c5a0438