root/xz - xz - Root on GIT

root/xz

mirror of https://git.tukaani.org/xz.git synced 2025-12-28 08:18:43 +00:00

Author	SHA1	Message	Date
Lasse Collin	22a35e64ce	lzmainfo: Use tuklib_mbstr_nonprint	2024-12-18 17:09:32 +02:00
Lasse Collin	03111595ee	xzdec: Use tuklib_mbstr_nonprint	2024-12-18 17:09:32 +02:00
Lasse Collin	d22f96921f	xz: Use tuklib_mbstr_nonprint Call tuklib_mask_nonprint() on filenames and also on a few other strings from the command line too. The filename printed by "xz --robot --list" (in list.c) is also masked. It's good to get rid of tabs and newlines which would desync the output but masking other chars wouldn't be strictly necessary. It might matter with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject non-ASCII chars) and a script wants to read a filename from xz's output. Hopefully it's an unusual enough corner case to not be a real problem.	2024-12-18 17:09:32 +02:00
Lasse Collin	40e5733055	Add tuklib_mbstr_nonprint to mask non-printable characters Malicious filenames or other untrusted strings may affect the state of the terminal when such strings are printed as part of (error) messages. Add functions that mask such characters. It's not enough to handle only single-byte control characters. In multibyte locales, some control characters are multibyte too, for example, terminals interpret C1 control characters (U+0080 to U+009F) that are two bytes as UTF-8. Instead of checking for control characters with iswcntrl(), this uses iswprint() to detect printable characters. This is much stricter. On Windows it's actually too strict as it rejects some characters that definitely are printable. Gnulib's quotearg would do a lot more but I hope this simpler method is good enough here. Thanks to Ryan Colyer for the discussion about the problems of the earlier single-byte-only method. Thanks to Christian Weisgerber for reporting a bug in an earlier version of this code. Thanks to Jeroen Roovers for a typo fix. Closes: https://github.com/tukaani-project/xz/pull/118	2024-12-18 17:09:32 +02:00
Lasse Collin	4a0c4f92b8	xz: Make one string simpler for translators Leading spaces in the string can get miscounted by translators.	2024-12-18 17:09:31 +02:00
Lasse Collin	3fcf547e92	lzmainfo: Sync the translatable strings with xz	2024-12-18 17:09:31 +02:00
Lasse Collin	3e9177fd20	xz: Use automatic word wrapping for help texts --long-help is now one line longer because --lzma1 is now on its own line.	2024-12-18 17:09:31 +02:00
Lasse Collin	ca529c3f41	Add tuklib_mbstr_wrap for automatic word wrapping Automatic word wrapping makes translators' work easier and reduces errors like misaligned columns or overlong lines. Right-to-left languages and languages that don't use spaces between words will still need extra effort. (xz hasn't been translated to any RTL language so far.)	2024-12-18 17:09:31 +02:00
Lasse Collin	314b83ceba	Build: Sort filenames to ASCII order in Makefile.am	2024-12-18 17:09:31 +02:00
Lasse Collin	df399c5255	tuklib_mbstr_width: Add tuklib_mbstr_width_mem() It's a new function split from tuklib_mbstr_width(). It's useful with partial strings that aren't terminated with \0.	2024-12-18 17:09:30 +02:00
Lasse Collin	51081efae4	tuklib_mbstr_width: Update a comment about shift states	2024-12-18 17:09:30 +02:00
Lasse Collin	7ff1b0ac53	tuklib_mbstr_width: Don't mention shift states in the API docs It is assumed that this code won't be used with charsets that use locking shift states.	2024-12-18 17:09:30 +02:00
Lasse Collin	3c16105936	tuklib_mbstr_width: Use stricter return value checking This should make no difference in practice (at least if mbrtowc() isn't broken).	2024-12-18 17:09:30 +02:00
Lasse Collin	b797c44c42	tuklib_mbstr_width: Change the behavior when wcwidth() is not available If wcwidth() isn't available (Windows), previously it was assumed that one byte == one column in the terminal. Now it is assumed that one multibyte character == one column. This works better with UTF-8. Languages that only use single-width characters without any combining characters should work correctly with this. In xz, none of po/*.po contain combining characters and only ko.po, zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only" those three translations in xz are broken on Windows with the UTF-8 code page. Broken means that column headings in xz -lvv and (only in the master branch) strings in --long-help are misaligned, so it's not a huge problem. I don't know if those three languages displayed perfectly before the UTF-8 change because I hadn't tested translations with native Windows builds before. Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	78868b6ed6	xzdec: Use setlocale() via tuklib_gettext_setlocale() xzdec isn't translated and didn't have locale-specific behavior in the past. On Windows with UTF-8 in the application manifest, setting the locale makes a difference though: - Without any setlocale() call, non-ASCII filenames don't display properly in Command Prompt unless one first uses "chcp 65001" to set the console code page to UTF-8. - setlocale(LC_ALL, "") is enough to make non-ASCII filenames print correctly in Command Prompt without using "chcp 65001", assuming that the non-UTF-8 code page (like 850) supports those non-ASCII characters. - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and such functions use an UTF-8 locale instead of a legacy code page. The tuklib_gettext_setlocale() macro takes care of this (without enabling any translations). Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	0d0b574cc4	Windows: Use UTF-8 locale when active code page is UTF-8 XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611. This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus non-ASCII characters from translations became mojibake. Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	20dfca8171	Windows: Document the need for setlocale(LC_ALL, ".UTF8") Also warn about unpaired surrogates and (somewhat UTF-8-specific) MAX_PATH issue in FindFirstFileA(). Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:29 +02:00
Lasse Collin	4e936f2340	xzdec: Call tuklib_progname_init() early enough If the early pledge() call on OpenBSD fails, it calls my_errorf() which requires the "progname" variable. Fixes: d74fb5f060b76db709b50f5fd37490394e52f975	2024-12-18 17:09:29 +02:00
Dexter Castor Döpping	bee0c044d3	liblzma: Fix incorrect macro name in a comment Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559 Closes: https://github.com/tukaani-project/xz/pull/155	2024-12-18 17:09:29 +02:00
Lasse Collin	c15115f7ed	liblzma: Optimize the loop conditions in BCJ filters Compilers cannot optimize the addition "i + 4" away since theoretically it could overflow.	2024-11-26 19:17:42 +02:00
Mark Wielaard	48ff3f0652	xz: Landlock: Fix a file descriptor leak	2024-11-25 12:28:44 +02:00
Lasse Collin	46ee006162	Windows: Embed an application manifest in the EXE files IMPORTANT: This includes a security fix to command line tool argument handling. Some toolchains embed an application manifest by default to declare UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11 to let the app access features newer than those of Vista. We want all the above but also two more things: - Declare that the app is long path aware to support paths longer than 259 characters (this may also require a registry change). - Force the code page to UTF-8. This allows the command line tools to access files whose names contain characters that don't exist in the current legacy code page (except unpaired surrogates). The UTF-8 code page also fixes security issues in command line argument handling which can be exploited with malicious filenames. See the new file w32_application.manifest.comments.txt. Thanks to Orange Tsai and splitline from DEVCORE Research Team for discovering this issue. Thanks to Vijay Sarvepalli for reporting the issue to me. Thanks to Kelvin Lee for testing with MSVC and helping with the required build system fixes.	2024-10-01 12:10:23 +03:00
Lasse Collin	dad1530915	Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2 Now the information in the "Details" tab in the file properties dialog matches the naming convention of Cygwin and MSYS2. This is only a cosmetic change.	2024-09-30 16:55:23 +03:00
Lasse Collin	8940ecb96f	common_w32res.rc: White space edits LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line between them.	2024-09-29 01:27:16 +03:00
Tobias Stoeckmann	76cfd0a9bb	lzmainfo: Avoid integer overflow The MB output can overflow with huge numbers. Most likely these are invalid .lzma files anyway, but let's avoid garbage output. lzmadec was adapted from LZMA Utils. The original code with this bug was written in 2005, over 19 years ago. Co-authored-by: Lasse Collin <lasse.collin@tukaani.org> Closes: https://github.com/tukaani-project/xz/pull/144	2024-09-17 01:26:02 +03:00
Tobias Stoeckmann	78355aebb7	xzdec: Remove unused short option -M "xzdec -M123" exited with exit status 1 without printing any messages. The "M:" entry should have been removed when the memory usage limiter support was removed from xzdec. Fixes: 792331bdee706aa852a78b171040ebf814c6f3ae Closes: https://github.com/tukaani-project/xz/pull/143 [ Lasse: Commit message edits ]	2024-09-16 23:33:29 +03:00
Yifeng Li	6cd7c86078	liblzma: Fix x86-64 movzw compatibility in range_decoder.h Support for instruction "movzw" without suffix in "GNU as" was added in commit [1] and stabilized in binutils 2.27, released in August 2016. Earlier systems don't accept this instruction without a suffix, making range_decoder.h's inline assembly unable to build on old systems such as Ubuntu 16.04, creating error messages like: lzma_decoder.c: Assembler messages: lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi' lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi' lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx' lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi' Change "movzw" to "movzwl" for compatibility. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2 Suggested-by: Lasse Collin <lasse.collin@tukaani.org> Tested-by: Yifeng Li <tomli@tomli.me> Signed-off-by: Yifeng Li <tomli@tomli.me> Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158 Fixes: https://github.com/tukaani-project/xz/issues/121 Closes: https://github.com/tukaani-project/xz/pull/136	2024-08-22 10:59:08 +03:00
Lasse Collin	f7103c2c2a	Revert "liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD" This reverts commit dc03f6290f5b9bd3d50c7e12e58dee870889d599. OpenBSD 7.6 will support elf_aux_info(3), and the detection code used on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop the OpenBSD-specific sysctl() method. Thanks to Christian Weisgerber.	2024-07-19 20:06:24 +03:00
Lasse Collin	7c292dd0bf	liblzma: Tweak a comment	2024-07-13 22:10:37 +03:00
Lasse Collin	baecfa1426	xz: Remove the TODO comment about --recursive It won't be implemented. find + xargs is more flexible, for example, it allows compressing small files in parallel. An example for that has been included in the xz man page since 2010.	2024-07-06 14:04:48 +03:00
Xi Ruoyao	7baf6835cf	liblzma: Speed up CRC32 calculation on 64-bit LoongArch The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32 result for 1/2/4/8 bytes in a single operation. Using these is much faster compared to the generic method. Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because the LoongArch specification says that CRC32 instructions shall be implemented for 64-bit processors. Optimized CRC32 isn't enabled for 32-bit LoongArch processors because not enough information is available about them. Co-authored-by: Lasse Collin <lasse.collin@tukaani.org> Closes: https://github.com/tukaani-project/xz/pull/86	2024-07-01 17:09:57 +03:00
Lasse Collin	0ed8936685	liblzma: ARM64 CRC32: Align the buffer faster Instead of doing it byte by byte, use the 1/2/4-byte CRC32 instructions.	2024-06-28 14:20:49 +03:00
Lasse Collin	fe77c4e130	liblzma: Tidy up crc_common.h Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with the other ARM64-specific lines. That macro isn't used outside this file. ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL isn't used anywhere yet. It's not ideal that the single-letter CRC utility macros are here as they pollute the namespace of the LZ encoder files. Those could be moved their own crc_macros.h like they were in 5.2.x but in practice this is fine enough already.	2024-06-23 23:09:14 +03:00
Lasse Collin	7484d37538	liblzma: Move lzma_crcXX_table[][] declarations to crc_common.h LZ encoder needs lzma_crc32_table[0] but otherwise those tables are private to the CRC code. In contrast, the other things in check.h are needed in several places.	2024-06-23 15:37:46 +03:00
Lasse Collin	85b081f5d4	liblzma: Make 32-bit x86 CRC assembly co-exist with CLMUL Now runtime detection of CLMUL support can pick between the CLMUL and the generic assembly implementations. Whatever overhead this has for builds that omit CLMUL completely isn't important because builds for any non-ancient system is likely to include the CLMUL code too. Handle the CRC tables in crcXX_fast.c files because now these files are built even when assembly code is used. If 32-bit x86 assembly is enabled then it will always be built even if compiler flags were such that CLMUL would be allowed unconditionally. That is, runtime detection will be used anyway. This keeps the build rules simpler. In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC is used without runtime detection. Previously this wasn't needed because crc32_table.c included the lzma_crc32_table[][] in the build unless encoder support had been disabled. Including an 8 KiB table was silly when only 1 KiB is actually used. So now liblzma is 7 KiB smaller if CLMUL is enabled without runtime detection.	2024-06-23 14:36:44 +03:00
Lasse Collin	6667d503b5	liblzma: CRC: Rename crcXX_generic to lzma_crcXX_generic This prepares for the possibility that lzma_crc32_generic and lzma_crc64_generic are extern functions.	2024-06-23 14:36:44 +03:00
Lasse Collin	6a3c4aaa43	Windows: Drop Visual Studio 2013 support This simplifies things a little. Building liblzma with VS2013 probably still worked but building the command line tools was not supported. Microsoft ended support for VS2013 on 2024-04.	2024-06-20 21:53:07 +03:00
Lasse Collin	30a2d5d510	liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed On E2K the function compiles only due to compiler emulation but the function is never used. It's cleaner to omit the function when it's not needed even though it's a "static inline" function. Thanks to Ilya Kurdyukov.	2024-06-17 15:00:55 +03:00
Lasse Collin	54eaea5ea4	liblzma: x86 CLMUL CRC: Rewrite It's faster with both tiny and large buffers and doesn't require disabling any sanitizers. With large buffers the extra speed is from folding four 16-byte chunks in parallel. The 32-bit x86 with MSVC reportedly still needs a workaround. Now the simpler "__asm mov ebx, ebx" trick is enough but it needs to be in lzma_crc64() instead of crc64_arch_optimized(). Thanks to Iouri Kharon for testing and the fix. Thanks to Ilya Kurdyukov for testing the speed with aligned and unaligned buffers on a few x86 processors and on E2K v6. Thanks to Sam James for general feedback. Fixes: https://github.com/tukaani-project/xz/issues/112 Fixes: https://github.com/tukaani-project/xz/issues/122	2024-06-17 15:00:49 +03:00
Lasse Collin	c0e7eaae8d	sysdefs.h: Add alignas	2024-06-16 12:59:20 +03:00
Lasse Collin	20014c2614	liblzma: Use a single macro to select CLMUL CRC to build This way it's clearer that two things cannot be selected at the same time.	2024-06-16 12:59:17 +03:00
Lasse Collin	d8fb098617	liblzma: CRC32 CLMUL: Refactor the constants and simplify By using modulus scaled constants, the final reduction can be simplified.	2024-06-16 12:56:54 +03:00
Lasse Collin	ef652ac391	liblzma: CRC64 CLMUL: Refactor the constants Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p and the p no longer has the lowest bit set (it makes no difference as the output bits it affects are ignored).	2024-06-16 12:56:54 +03:00
Lasse Collin	9f5fc17e32	liblzma: Add crc_clmul_consts_gen.c It's a standalone program that prints the required constants. It's won't be a part of the normal build of the package.	2024-06-16 12:56:54 +03:00
Lasse Collin	71b147aab7	liblzma: Remove CRC_USE_GENERIC_FOR_SMALL_INPUTS It was already commented out.	2024-06-16 12:56:54 +03:00
Lasse Collin	f99a7be406	liblzma: Remove crc_attr_no_sanitize_address It's not enough to silence the address sanitizer. Also memory and thread sanitizers would need to be silenced. They, at least currently, aren't smart enough to see that the extra bytes are discarded from the xmm registers by later instructions. Valgrind is smarter, possibly because this kind of code isn't weird to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions this idea of doing an aligned read and then discarding the extra bytes. The sanitizers don't instrument assembly code but Valgrind checks all code. It's better to change the implementation to avoid the sanitization attributes which also look scary in the code. (Somehow they can look more scary than __asm__ which is implictly unsanitized.) See also: https://github.com/tukaani-project/xz/issues/112 https://github.com/tukaani-project/xz/issues/122	2024-06-16 12:56:54 +03:00
Lasse Collin	c7164b1927	xz: Fix white space	2024-06-11 22:42:26 +03:00
Lasse Collin	0a32d2072c	liblzma: Fix a typo in a comment Thanks to Sam James for spotting it. Fixes: f644473a211394447824ea00518d0a214ff3f7f2	2024-06-11 22:42:04 +03:00
Lasse Collin	afd9b4d282	liblzma: Fix a comment indentation	2024-06-10 23:19:27 +03:00
Lasse Collin	50e6bff274	liblzma: Fix white space	2024-06-10 23:19:27 +03:00

1 2 3 4 5 ...

1385 Commits