Commit Graph

2603 Commits

Author SHA1 Message Date
Lasse Collin 54eaea5ea4 liblzma: x86 CLMUL CRC: Rewrite
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.

Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.

Thanks to Sam James for general feedback.

Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
2024-06-17 15:00:49 +03:00
Lasse Collin c0e7eaae8d sysdefs.h: Add alignas 2024-06-16 12:59:20 +03:00
Lasse Collin 20014c2614 liblzma: Use a single macro to select CLMUL CRC to build
This way it's clearer that two things cannot be selected
at the same time.
2024-06-16 12:59:17 +03:00
Lasse Collin d8fb098617 liblzma: CRC32 CLMUL: Refactor the constants and simplify
By using modulus scaled constants, the final reduction can
be simplified.
2024-06-16 12:56:54 +03:00
Lasse Collin ef652ac391 liblzma: CRC64 CLMUL: Refactor the constants
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
2024-06-16 12:56:54 +03:00
Lasse Collin 9f5fc17e32 liblzma: Add crc_clmul_consts_gen.c
It's a standalone program that prints the required constants.
It's won't be a part of the normal build of the package.
2024-06-16 12:56:54 +03:00
Lasse Collin 71b147aab7 liblzma: Remove CRC_USE_GENERIC_FOR_SMALL_INPUTS
It was already commented out.
2024-06-16 12:56:54 +03:00
Lasse Collin f99a7be406 liblzma: Remove crc_attr_no_sanitize_address
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
https://github.com/tukaani-project/xz/issues/112
https://github.com/tukaani-project/xz/issues/122
2024-06-16 12:56:54 +03:00
Lasse Collin ead4d15199 Revert "Build: Temporarily disable CRC CLMUL to silence OSS Fuzz"
This reverts commit 9f1a6d6f9a.
2024-06-16 12:56:54 +03:00
Lasse Collin 2178acf8a4 CMake: Prefer C11 with a fallback to C99
There is no need to make a similar change in configure.ac.
With Autoconf 2.72, the deprecated macro AC_PROG_CC_C99
is an alias for AC_PROG_CC which prefers a C11 compiler.
2024-06-12 14:28:37 +03:00
Lasse Collin c97e9c12fe Update THANKS 2024-06-12 14:20:21 +03:00
Lasse Collin 89e9f12e03 Tests: Improve the CRC32 test
A similar one was already there for CRC64 but nowadays also CRC32
has a CLMUL implementation, so it's good to test it better too.
2024-06-11 22:44:44 +03:00
Lasse Collin c7164b1927 xz: Fix white space 2024-06-11 22:42:26 +03:00
Lasse Collin 0a32d2072c liblzma: Fix a typo in a comment
Thanks to Sam James for spotting it.

Fixes: f644473a21
2024-06-11 22:42:04 +03:00
Lasse Collin afd9b4d282 liblzma: Fix a comment indentation 2024-06-10 23:19:27 +03:00
Lasse Collin 50e6bff274 liblzma: Fix white space 2024-06-10 23:19:27 +03:00
Lasse Collin caea7844d3 tuklib: __STDC_VERSION__ in C23 is 202311 2024-06-10 23:19:27 +03:00
RainRat 9e73918a4f Fix typos
Closes: https://github.com/tukaani-project/xz/pull/124
2024-06-07 16:01:27 +03:00
Lasse Collin 04b23addf3 tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.

An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.

Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.

Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
2024-06-07 15:47:20 +03:00
Lasse Collin dc03f6290f liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD
The C code is from Christian Weisgerber, I merely reordered the OSes.
Then I added the build system checks without testing them.

Also thanks to Brad Smith who submitted a similar patch on GitHub
a few hours after Christian had sent his via email.

Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Closes: https://github.com/tukaani-project/xz/pull/125
2024-06-07 15:06:59 +03:00
Lasse Collin f5c2ae58ec Update THANKS 2024-06-05 13:55:43 +03:00
Lasse Collin e5491dfab9 CMake: Include the "alpha" or "beta" suffix in PACKAGE_VERSION
This way the version string gets into xzgrep and other scripts
in full and also into liblzma.pc.

For the project() command, a suffixless string is required though.
2024-06-05 13:42:47 +03:00
Lasse Collin 1d3c61575f CMake: Fix wrong version variable
liblzma_VERSION has never existed in the repository. xz_VERSION from
the project() command was used for liblzma SOVERSION so use xz_VERSION
here too.

The wrong variable did no harm in practice as PROJECT_VERSION
was used as the fallback. It has the same value as xz_VERSION.

Fixes: 7e3493d40e
2024-06-05 13:30:28 +03:00
Lasse Collin 5d1c649ba9 CMake: Set only "prefix" as an absolute path in liblzma.pc
CMake provides variables that are relative to CMAKE_INSTALL_PREFIX
so use them instead of repeating the full path.
2024-06-05 12:59:59 +03:00
Lasse Collin e0d6d05ce0 CMake: Fix liblzma filename in Windows environments
This is a mess because liblzma DLL outside Cygwin and MSYS2
is liblzma.dll instead of lzma.dll to avoid a conflict with
lzma.dll from LZMA SDK.

On Cygwin the name was "liblzma-5.dll" while "cyglzma-5.dll"
would have been correct (and match what Libtool produces).
MSYS2 likely was broken too as it uses the "msys-" prefix.

This change has no effect with MinGW-w64 because with that
the "lib" prefix was correct already.

With MSVC builds this is a small breaking change that requires developers
to adjust the library name when linking against liblzma. The liblzma.dll
name is kept as is but the import library and static library are now
lzma.lib instead of liblzma.lib. This is helpful when using pkgconf
because "pkgconf --msvc-syntax --libs liblzma" outputs "lzma.lib"
(it's converted from "-llzma" in liblzma.pc). It would be easy to
keep the liblzma.lib naming but the pkgconf compatibility seems worth
it in the long run. The lzma.lib name is compatible with MinGW-w64
too as -llzma will find also lzma.lib.

vcpkg had been patching CMakeLists.txt this way since 2022 but I
learned this only recently. The reasoning for the patch makes sense,
and while this is a small breaking change with MSVC, it seems like
a decent compromise as it keeps the DLL name the same.

2022 patch in vcpkg: 0707a17ecf/ports/liblzma/win_output_name.patch
See the discussion: https://github.com/microsoft/vcpkg/pull/39024

Thanks to Vincent Torri for confirming the naming issue on Cygwin.
2024-06-04 23:59:29 +03:00
Lasse Collin e7a42cda7c Fix version.sh compatiblity with Solaris
The ancient /bin/tr on Solaris doesn't support '\n'.
With /usr/xpg4/bin/tr it works but it might not be in PATH.

Another problem was that sed was given input that didn't have a newline
at the end. Text files must end with a newline to be portable.

Fix both problems:

  - Handle multiline input within sed itself to avoid one tr invocation.
    The default sed even on Solaris does understand \n.

  - Use octals in tr -d. \012 works for ASCII "line feed", it's even
    used as an example in the Solaris man page. But we must strip
    also ASCII "carriage return" \015 and EBCDIC "next line" \025.
    The EBCDIC case got handled with \n previously. Stripping \012
    and \015 on EBCDIC system won't matter as those control chars
    won't be present in the string in the first place.

An awk-based solution could be an alternative but it might need
special casing on Solaris to used nawk instead of awk. The changes
in this commit are smaller and should have a smaller risk for
regressions. It's also possible that version.sh will be dropped
entirely at some point.
2024-06-03 23:06:10 +03:00
Lasse Collin a61c9ab475 CI: Don't require po4a on Solaris 2024-06-03 23:05:31 +03:00
Lasse Collin 5229bdf533 CI: Use set -e on Solaris too 2024-06-03 23:04:32 +03:00
Lasse Collin afa938e429 CMake: Install liblzma.pc even with MSVC
I had misunderstood that it wouldn't be useful with MSVC.
vcpkg had been installing liblzma.pc with custom rules since 2020,
years before liblzma.pc support was added to CMakeLists.txt.

See:
eb895b95aa/ports/liblzma/portfile.cmake
https://github.com/microsoft/vcpkg/pull/39024#issuecomment-2145064670
2024-06-03 17:44:50 +03:00
Sam James 35f8649f08 ci: don't pin official GH actions via commit, just tag
There's no real value in doing it via commit for official GH actions. We
can keep using pinned commits for unofficial actions. It's hassle for no
gain.

Maybe going forward we can limit this further by only being paranoid
for the jobs with any access to tokens.
2024-06-03 12:32:34 +03:00
Christoph Junghans e885dae37f ci: set -e on openbsd
Closes: https://github.com/tukaani-project/xz/pull/116
2024-06-03 12:32:34 +03:00
Christoph Junghans 21b02dd128 ci: set -e on netbsd 2024-06-03 12:32:34 +03:00
Christoph Junghans 8641f0c24c ci: actually fail on FreeBSD
Without "set -e" the job will always be successful.

See vmactions/freebsd-vm#72
2024-06-03 12:32:34 +03:00
Andrew Murray ef616683ef Updated actions
Closes: https://github.com/tukaani-project/xz/pull/115
2024-06-03 12:32:34 +03:00
Sam James 57b440d316 ci: add po4a 2024-06-03 12:32:34 +03:00
Sam James 08cdf4be9a ci: add Solaris
Inspired by 3f2a38b011.

It runs on Solaris 5.11 via a VirtualBox VM.
2024-06-03 12:32:34 +03:00
Sam James b69768c8bd xz: list: suppress -Wformat-nonliteral for Solaris
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
  uint64_to_str(totals.files, 0));
  ^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```

It's presumably because of older gettext missing format attributes.

This is with `gcc (GCC) 7.3.0`.
2024-06-03 12:32:34 +03:00
Lasse Collin bb90e1f66d license-check.sh: Fix reporting of unclear license info
The main feature was broken because an old variable name hadn't
been updated to match the rest of the script.
2024-06-03 11:44:28 +03:00
Lasse Collin b8d134e61e Update THANKS 2024-05-31 21:36:26 +03:00
Lasse Collin 162587d3fb Translations: Run po4a/update-po
Now the files are in the new formatting without source file
line numbers. Future updates should keep the diffs much smaller.
2024-05-29 23:36:48 +03:00
Lasse Collin 50cd8ed002 Translations: Run "make -C po update-po"
In the past this wasn't done before releases; the Git repository
just contained the files from the Translation Project. But this
way it is clearer when comparing release tarballs against the
Git repository. In future releases this might no longer be necessary
within a stable branch as the .po files won't change so easily anymore
when creating a tarball.
2024-05-29 23:36:48 +03:00
Lasse Collin 16dbd865c8 Add NEWS for 5.6.2 2024-05-29 21:00:30 +03:00
Lasse Collin a0eeb5f936 Add NEWS for 5.4.7 2024-05-29 21:00:30 +03:00
Lasse Collin 9b476fb93a Add NEWS for 5.2.13 2024-05-29 21:00:30 +03:00
Lasse Collin 9284f1aea3 Build: Update po/*.po files only when needed
When po/xz.pot doesn't exist, running "make" or "make dist" will
create it. Then the .po files will be updated but only if they
actually would change more than the POT-Creation-Date line.
Then the .gmo files would be generated from the .po files.
This is the case before and after this commit.

However, "make dist" and thus "make mydist" did a forced update
to the files, updating them even if the only change was the
POT-Creation-Date line. This had pros and cons: It made it clear
that the .po file really is in sync with the recent strings in
the package. On the other hand, it added noise in form of changed
files in the source tree and distribution tarballs. It can be
ignored with something like "diff -I'^"POT-Creation-Date: '" but
it's still a minor annoyance *if* there's not enough value in
having the most recent timestamp.

Setting DIST_DEPENDS_ON_UPDATE_PO = no means that such forced
update won't happen in "make dist" anymore. However, the "mydist"
target will use xz.pot-update target which is the same target that
is run when xz.pot doesn't exist at all yet. Thus "mydist" will
ensure that the translations are up to date, without noise from
changes that would affect only the POT-Creation-Date line.

Note that po4a always uses msgmerge with --update, so POT-Creation-Date
in the man page translations is never the only change in .po files.
In that sense this commit makes the message translations behave more
similarly to the man page translations.

Distribution tarballs will still have non-reproducible POT-Creation-Date
in po/xz.pot and po4a/xz-man.pot but those are just two files. Even they
could be made reproducible from a Git timestamp if desired.
2024-05-29 16:33:24 +03:00
Lasse Collin 4beba1cd62 po4a/update-po: Disable wrapping in .pot and .po files
The .po files from the Translation Project come with unwrapped
strings so this matches it.

This may reduce the noise in diffs too. When the beginning of
a paragraph had changed, the rest of the lines got rewrapped
in msgsid. Now it's just one very long line that changes when
a paragraph has been edited.

The --add-location=file option was removed as redundant. The line
numbers don't exist in the .pot file due to --porefs file and thus
they cannot get copied to the .po files either.
2024-05-28 21:10:33 +03:00
Lasse Collin b14c130a58 Update contact info in README 2024-05-28 18:36:53 +03:00
Lasse Collin 75f5f2e014 Translations: Use --package-name=xz-man with po4a
This is to match reality. See the added comment.
2024-05-28 13:25:07 +03:00
Lasse Collin eb217d016c Translations: Omit --package-name from po/Makevars
This is closer to the reality in the po/*.po files.
2024-05-28 13:03:40 +03:00
Lasse Collin d28a4b2520 license-check.sh: Use '--' with slightly untrusted filenames
Names from git ls-files should be safe but if one runs it on
a tree without the .git dir and there are extra files, it's
safer to have the end of arguments marked with '--'.
2024-05-28 12:18:09 +03:00