root/xz - xz - Root on GIT

root/xz

mirror of https://git.tukaani.org/xz.git synced 2026-03-18 07:38:03 +00:00

Author	SHA1	Message	Date
Lasse Collin	4014e2479c	xz: O_SEARCH cannot be used for fsync() Opening a directory with O_SEARCH results in a file descriptor that can be used with functions like openat(). Such a file descriptor cannot be used with fsync(). Use O_RDONLY instead. In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor from O_PATH doesn't allow fsync(). Seems that it's not possible to fsync() a directory that has write and search permissions but not read permission. Fixes: 2a9e91d796d091740489d951fa7780525e4275f1	2025-01-05 21:43:11 +02:00
Lasse Collin	c405264c03	tuklib_mbstr_nonprint: Preserve the value of errno A typical use case is like this: printf("%s: %s\n", tuklib_mask_nonprint(filename), strerror(errno)); tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify errno. If errno isn't preserved, the error message might be wrong if a compiler decides to call tuklib_mask_nonprint() before strerror(). Fixes: 40e573305535960574404d2eae848b248c95ea7e	2025-01-05 20:16:09 +02:00
Lasse Collin	2a9e91d796	xz: Use fsync() before deleting the input file, and add --no-sync xz's default behavior is to delete the input file after successful compression or decompression (unless writing to standard output). If the system crashes soon after the deletion, it is possible that the newly written file has not yet hit the disk while the previous delete operation might have. In that case neither the original file nor the written file is available. Call fsync() on the file. On POSIX systems, sync also the directory where the file was created. Add a new option --no-sync which disables fsync() usage. It can avoid a (possibly significant) performance penalty when processing many small files. It's fine to use --no-sync when one knows that the files are easy to recreate or restore after a system crash. Using fsync() after every flush initiated by --flush-timeout was considered. It wasn't implemented at least for now. - --flush-timeout is typically used when writing to stdout. If stdout is a file, xz cannot (portably) sync the directory of the file. One would need to create the output file first, sync the directory, and then run xz with fsync() enabled. - If xz --flush-timeout output goes to a file, it's possible to use a separate script to sync the file, for example, once per minute while telling xz to flush more frequently. - Not supporting syncing with --flush-timeout was simpler. Portability notes: - On systems that lack O_SEARCH (like Linux), "xz dir/file" will now fail if "dir" cannot be opened for reading. If "dir" still has write and search permissions (like d-wx------ in "ls -l"), previously xz would have been able to compress "dir/file" still. Now it only works if using --no-sync (or --keep or --stdout). - <libgen.h> and dirname() should be available on all POSIX systems, and aren't needed on non-POSIX systems. - fsync() is available on all POSIX systems. The directory syncing could be changed to fdatasync() although at least on ext4 it doesn't seem to make a performance difference in xz's usage. fdatasync() would need a build system check to support (old) special cases, for example, MINIX 3.3.0 doesn't have fdatasync() and Solaris 10 needs -lrt. - On native Windows, _commit() is used to replace fsync(). Directory syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on directories is a no-op.) - DJGPP has fsync() for files. ;-) Using fsync() was considered somewhere around 2009 and again in 2016 but those times the idea was rejected. For comparison, GNU gzip 1.7 (2016) added the option --synchronous which enables fsync(). Co-authored-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fixes: https://bugs.debian.org/814089 Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00282.html Closes: https://github.com/tukaani-project/xz/pull/151	2025-01-05 20:16:08 +02:00
Lasse Collin	2e28c71457	xz: Use "goto" for error handling in io_open_dest_real()	2025-01-05 20:16:01 +02:00
Lasse Collin	7510721767	liblzma: Always validate the first digit of a preset string lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The call from str_to_filters() detects the string type from the first character(s) and as a side-effect it validates the first digit of the preset string. So this change makes no difference there. However, the call from parse_options() doesn't pre-validate the string. parse_lzma12_preset() will return an invalid value which is passed to lzma_lzma_preset() which safely rejects it. The bug still affects the the error message: $ xz --filters=lzma2:preset=X xz: Error in --filters=FILTERS option: xz: lzma2:preset=X xz: ^ xz: Unsupported preset After the fix: $ xz --filters=lzma2:preset=X xz: Error in --filters=FILTERS option: xz: lzma2:preset=X xz: ^ xz: Unsupported preset The ^ now correctly points to the X and not past it because the X itself is the problematic character. Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203	2025-01-05 12:58:22 +02:00
Lasse Collin	52ff324337	xz: Fix getopt_long argument type in --filters* Forgetting the argument (or not using = to separate the option from the argument) resulted in lzma_str_to_filters() being called with NULL as input string argument. The function handles it fine but xz passes the NULL to printf() too: $ xz --filters xz: Error in --filters=FILTERS option: xz: (null) xz: ^ xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters() Now it's correct: $ xz --filters xz: option '--filters' requires an argument The --filters-help option doesn't take any arguments. Fixes: 9ded880a0221f4d1256845fc4ab957ffd377c760 Fixes: d6af7f347077b22403133239592e478931307759 Fixes: a165d7df1964121eb9df715e6f836a31c865beef	2025-01-05 11:41:40 +02:00
Lasse Collin	2655c81b5e	xzdec: Don't leave Landlock file descriptor open for no reason This fix is similar to 48ff3f06521ca326996ab9a04d1b342098960427. Fixes: d74fb5f060b76db709b50f5fd37490394e52f975	2025-01-04 20:05:03 +02:00
Lasse Collin	35df4c2bc0	xz: Make --single-stream imply --keep Suggested by xx on #tukaani on 2024-04-12.	2025-01-04 20:02:18 +02:00
Lasse Collin	6f412814a8	Update AUTHORS The contributions have been rewritten.	2025-01-04 19:57:17 +02:00
Lasse Collin	5651d15303	xz: Avoid printf formats like %2$s It's a POSIX feature that isn't in standard C. It's not available on Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support it even though it supports POSIX %'d for thousand separators. Gettext's <libintl.h> provides overrides for printf and other functions which do support the %2$s formats. Translations use them. But xz should work on Windows without <libintl.h> too. Fixes: 3e9177fd206d20d6d8acc7d203c25a9ae0549229	2025-01-04 17:37:46 +02:00
Lasse Collin	63b246c90e	tuklib_mbstr_wrap: Add printf format attribute It's supported by GCC 3.x already.	2025-01-04 17:37:46 +02:00
Lasse Collin	a7313c01d9	xz: Translate a Windows-specific string Originally I thought that native Windows builds wouldn't be translated but nowadays at least MSYS2 ships such binaries.	2025-01-04 17:37:39 +02:00
Lasse Collin	00eb6073c0	xz: Use my_landlock.h A slightly silly thing is that xz may now query the ABI version up to three times. We could call my_landlock_ruleset_attr_forbid_all() only once and cache the result but it didn't seem worth doing.	2025-01-02 15:43:38 +02:00
Lasse Collin	0fc5a625d7	xzdec: Use my_landlock.h	2025-01-02 15:43:38 +02:00
Lasse Collin	38cb8ec9fd	Add my_landlock.h with helper functions to use Linux Landlock This supports up to Landlock ABI version 6. The current code in xz and xzdec only support up to ABI version 4.	2025-01-02 15:43:38 +02:00
Lasse Collin	672da29bb3	liblzma: Silence warnings from "clang -Wimplicit-fallthrough"	2025-01-02 15:43:38 +02:00
Lasse Collin	94adc996e4	Replace "Fall through" comments with FALLTHROUGH	2025-01-02 15:43:37 +02:00
Lasse Collin	f31c3a6647	sysdefs.h: Add FALLTHROUGH macro	2025-01-02 15:43:37 +02:00
Lasse Collin	e34dbd6a0a	xzdec: Fix language in a comment	2025-01-02 15:43:37 +02:00
Lasse Collin	16821252c5	Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1 Also remove the recently-added workaround from tuklib_gettext.h. Requiring a new enough gettext-runtime is cleaner. I guess it's mostly MSYS2 where xz is built with translation support, so once MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem in practice.	2025-01-02 15:35:25 +02:00
Lasse Collin	653732bd6f	xz man page: Describe the source file deletion in -z and -d options The DESCRIPTION section always explained it, and the OPTIONS section only described the differences to the default behavior. However, new users in a hurry may skip reading DESCRIPTION. The default behavior is a bit dangerous, thus it's good to repeat in --compress and --decompress docs that source file is removed after successful operation. Fixes: https://github.com/tukaani-project/xz/issues/150	2024-12-30 10:51:26 +02:00
Lasse Collin	bb79f79b27	Build: Set libtool -version-info so that it matches with CMake In the past, they haven't been in sync in development versions although they (of course) have been in stable releases.	2024-12-29 10:54:45 +02:00
Lasse Collin	260d5d3620	xz: Fix comments	2024-12-27 09:14:56 +02:00
Lasse Collin	f8c328eed1	Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale() See the comment. In this package, locale is set at program startup and not changed later, so the point (2) in the comment isn't a problem. Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-20 16:33:34 +02:00
Lasse Collin	0353390609	Revert "Windows: Use UTF-8 locale when active code page is UTF-8" This reverts commit 0d0b574cc45045d6150d397776340c068df59e2a.	2024-12-20 16:33:34 +02:00
Lasse Collin	4b319e05af	xzdec: Use setlocale() instead of tuklib_gettext_setlocale() xzdec isn't translated and doesn't need libintl on Windows even when NLS is enabled, thus libintl_setlocale() cannot interfere with the locale settings. Thus, standard setlocale() works perfectly. In the commit 78868b6e, the explanation in the commit message is wrong. Fixes: 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db	2024-12-20 16:33:34 +02:00
Lasse Collin	34b80e282e	Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation Only leave the FindFileFirstA() notes from 20dfca81, reverting the incorrect setlocale() notes. On Windows, Gettext's <libintl.h> overrides setlocale() with libintl_setlocale() wrapper. I hadn't noticed this, and thus my conclusions were wrong. Fixes: 20dfca8171dad4c64785ac61d5b68972c444877b	2024-12-20 16:33:28 +02:00
Lasse Collin	5794cda064	tuklib_mbstr_wrap: Silence a warning from Clang Fixes: ca529c3f41a4a19a59e2e252e6dd9255f130c634	2024-12-18 17:50:58 +02:00
Lasse Collin	22a35e64ce	lzmainfo: Use tuklib_mbstr_nonprint	2024-12-18 17:09:32 +02:00
Lasse Collin	03111595ee	xzdec: Use tuklib_mbstr_nonprint	2024-12-18 17:09:32 +02:00
Lasse Collin	d22f96921f	xz: Use tuklib_mbstr_nonprint Call tuklib_mask_nonprint() on filenames and also on a few other strings from the command line too. The filename printed by "xz --robot --list" (in list.c) is also masked. It's good to get rid of tabs and newlines which would desync the output but masking other chars wouldn't be strictly necessary. It might matter with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject non-ASCII chars) and a script wants to read a filename from xz's output. Hopefully it's an unusual enough corner case to not be a real problem.	2024-12-18 17:09:32 +02:00
Lasse Collin	40e5733055	Add tuklib_mbstr_nonprint to mask non-printable characters Malicious filenames or other untrusted strings may affect the state of the terminal when such strings are printed as part of (error) messages. Add functions that mask such characters. It's not enough to handle only single-byte control characters. In multibyte locales, some control characters are multibyte too, for example, terminals interpret C1 control characters (U+0080 to U+009F) that are two bytes as UTF-8. Instead of checking for control characters with iswcntrl(), this uses iswprint() to detect printable characters. This is much stricter. On Windows it's actually too strict as it rejects some characters that definitely are printable. Gnulib's quotearg would do a lot more but I hope this simpler method is good enough here. Thanks to Ryan Colyer for the discussion about the problems of the earlier single-byte-only method. Thanks to Christian Weisgerber for reporting a bug in an earlier version of this code. Thanks to Jeroen Roovers for a typo fix. Closes: https://github.com/tukaani-project/xz/pull/118	2024-12-18 17:09:32 +02:00
Lasse Collin	4a0c4f92b8	xz: Make one string simpler for translators Leading spaces in the string can get miscounted by translators.	2024-12-18 17:09:31 +02:00
Lasse Collin	3fcf547e92	lzmainfo: Sync the translatable strings with xz	2024-12-18 17:09:31 +02:00
Lasse Collin	3e9177fd20	xz: Use automatic word wrapping for help texts --long-help is now one line longer because --lzma1 is now on its own line.	2024-12-18 17:09:31 +02:00
Lasse Collin	ca529c3f41	Add tuklib_mbstr_wrap for automatic word wrapping Automatic word wrapping makes translators' work easier and reduces errors like misaligned columns or overlong lines. Right-to-left languages and languages that don't use spaces between words will still need extra effort. (xz hasn't been translated to any RTL language so far.)	2024-12-18 17:09:31 +02:00
Lasse Collin	314b83ceba	Build: Sort filenames to ASCII order in Makefile.am	2024-12-18 17:09:31 +02:00
Lasse Collin	df399c5255	tuklib_mbstr_width: Add tuklib_mbstr_width_mem() It's a new function split from tuklib_mbstr_width(). It's useful with partial strings that aren't terminated with \0.	2024-12-18 17:09:30 +02:00
Lasse Collin	51081efae4	tuklib_mbstr_width: Update a comment about shift states	2024-12-18 17:09:30 +02:00
Lasse Collin	7ff1b0ac53	tuklib_mbstr_width: Don't mention shift states in the API docs It is assumed that this code won't be used with charsets that use locking shift states.	2024-12-18 17:09:30 +02:00
Lasse Collin	3c16105936	tuklib_mbstr_width: Use stricter return value checking This should make no difference in practice (at least if mbrtowc() isn't broken).	2024-12-18 17:09:30 +02:00
Lasse Collin	b797c44c42	tuklib_mbstr_width: Change the behavior when wcwidth() is not available If wcwidth() isn't available (Windows), previously it was assumed that one byte == one column in the terminal. Now it is assumed that one multibyte character == one column. This works better with UTF-8. Languages that only use single-width characters without any combining characters should work correctly with this. In xz, none of po/*.po contain combining characters and only ko.po, zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only" those three translations in xz are broken on Windows with the UTF-8 code page. Broken means that column headings in xz -lvv and (only in the master branch) strings in --long-help are misaligned, so it's not a huge problem. I don't know if those three languages displayed perfectly before the UTF-8 change because I hadn't tested translations with native Windows builds before. Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	78868b6ed6	xzdec: Use setlocale() via tuklib_gettext_setlocale() xzdec isn't translated and didn't have locale-specific behavior in the past. On Windows with UTF-8 in the application manifest, setting the locale makes a difference though: - Without any setlocale() call, non-ASCII filenames don't display properly in Command Prompt unless one first uses "chcp 65001" to set the console code page to UTF-8. - setlocale(LC_ALL, "") is enough to make non-ASCII filenames print correctly in Command Prompt without using "chcp 65001", assuming that the non-UTF-8 code page (like 850) supports those non-ASCII characters. - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and such functions use an UTF-8 locale instead of a legacy code page. The tuklib_gettext_setlocale() macro takes care of this (without enabling any translations). Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	0d0b574cc4	Windows: Use UTF-8 locale when active code page is UTF-8 XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611. This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus non-ASCII characters from translations became mojibake. Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:30 +02:00
Lasse Collin	20dfca8171	Windows: Document the need for setlocale(LC_ALL, ".UTF8") Also warn about unpaired surrogates and (somewhat UTF-8-specific) MAX_PATH issue in FindFirstFileA(). Fixes: 46ee0061629fb075d61d83839e14dd193337af59	2024-12-18 17:09:29 +02:00
Lasse Collin	4e936f2340	xzdec: Call tuklib_progname_init() early enough If the early pledge() call on OpenBSD fails, it calls my_errorf() which requires the "progname" variable. Fixes: d74fb5f060b76db709b50f5fd37490394e52f975	2024-12-18 17:09:29 +02:00
Dexter Castor Döpping	bee0c044d3	liblzma: Fix incorrect macro name in a comment Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559 Closes: https://github.com/tukaani-project/xz/pull/155	2024-12-18 17:09:29 +02:00
Lasse Collin	c15115f7ed	liblzma: Optimize the loop conditions in BCJ filters Compilers cannot optimize the addition "i + 4" away since theoretically it could overflow.	2024-11-26 19:17:42 +02:00
Mark Wielaard	48ff3f0652	xz: Landlock: Fix a file descriptor leak	2024-11-25 12:28:44 +02:00
Lasse Collin	46ee006162	Windows: Embed an application manifest in the EXE files IMPORTANT: This includes a security fix to command line tool argument handling. Some toolchains embed an application manifest by default to declare UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11 to let the app access features newer than those of Vista. We want all the above but also two more things: - Declare that the app is long path aware to support paths longer than 259 characters (this may also require a registry change). - Force the code page to UTF-8. This allows the command line tools to access files whose names contain characters that don't exist in the current legacy code page (except unpaired surrogates). The UTF-8 code page also fixes security issues in command line argument handling which can be exploited with malicious filenames. See the new file w32_application.manifest.comments.txt. Thanks to Orange Tsai and splitline from DEVCORE Research Team for discovering this issue. Thanks to Vijay Sarvepalli for reporting the issue to me. Thanks to Kelvin Lee for testing with MSVC and helping with the required build system fixes.	2024-10-01 12:10:23 +03:00

1 2 3 4 5 ...

1413 Commits