Opening a directory with O_SEARCH results in a file descriptor that can
be used with functions like openat(). Such a file descriptor cannot be
used with fsync(). Use O_RDONLY instead.
In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor
from O_PATH doesn't allow fsync().
Seems that it's not possible to fsync() a directory that has write
and search permissions but not read permission.
Fixes: 2a9e91d796
A typical use case is like this:
printf("%s: %s\n", tuklib_mask_nonprint(filename), strerror(errno));
tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify
errno. If errno isn't preserved, the error message might be wrong if
a compiler decides to call tuklib_mask_nonprint() before strerror().
Fixes: 40e5733055
xz's default behavior is to delete the input file after successful
compression or decompression (unless writing to standard output).
If the system crashes soon after the deletion, it is possible that
the newly written file has not yet hit the disk while the previous
delete operation might have. In that case neither the original file
nor the written file is available.
Call fsync() on the file. On POSIX systems, sync also the directory
where the file was created.
Add a new option --no-sync which disables fsync() usage. It can avoid
a (possibly significant) performance penalty when processing many
small files. It's fine to use --no-sync when one knows that the files
are easy to recreate or restore after a system crash.
Using fsync() after every flush initiated by --flush-timeout was
considered. It wasn't implemented at least for now.
- --flush-timeout is typically used when writing to stdout. If stdout
is a file, xz cannot (portably) sync the directory of the file.
One would need to create the output file first, sync the directory,
and then run xz with fsync() enabled.
- If xz --flush-timeout output goes to a file, it's possible to use
a separate script to sync the file, for example, once per minute
while telling xz to flush more frequently.
- Not supporting syncing with --flush-timeout was simpler.
Portability notes:
- On systems that lack O_SEARCH (like Linux), "xz dir/file" will now
fail if "dir" cannot be opened for reading. If "dir" still has
write and search permissions (like d-wx------ in "ls -l"),
previously xz would have been able to compress "dir/file" still.
Now it only works if using --no-sync (or --keep or --stdout).
- <libgen.h> and dirname() should be available on all POSIX systems,
and aren't needed on non-POSIX systems.
- fsync() is available on all POSIX systems. The directory syncing
could be changed to fdatasync() although at least on ext4 it
doesn't seem to make a performance difference in xz's usage.
fdatasync() would need a build system check to support (old)
special cases, for example, MINIX 3.3.0 doesn't have fdatasync()
and Solaris 10 needs -lrt.
- On native Windows, _commit() is used to replace fsync(). Directory
syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on
directories is a no-op.)
- DJGPP has fsync() for files. ;-)
Using fsync() was considered somewhere around 2009 and again in 2016 but
those times the idea was rejected. For comparison, GNU gzip 1.7 (2016)
added the option --synchronous which enables fsync().
Co-authored-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Fixes: https://bugs.debian.org/814089
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00282.html
Closes: https://github.com/tukaani-project/xz/pull/151
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.
However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
After the fix:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
The ^ now correctly points to the X and not past it because the X itself
is the problematic character.
Fixes: cedeeca2ea
Forgetting the argument (or not using = to separate the option from
the argument) resulted in lzma_str_to_filters() being called with NULL
as input string argument. The function handles it fine but xz passes
the NULL to printf() too:
$ xz --filters
xz: Error in --filters=FILTERS option:
xz: (null)
xz: ^
xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters()
Now it's correct:
$ xz --filters
xz: option '--filters' requires an argument
The --filters-help option doesn't take any arguments.
Fixes: 9ded880a02
Fixes: d6af7f3470
Fixes: a165d7df19
It's a POSIX feature that isn't in standard C. It's not available on
Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support
it even though it supports POSIX %'d for thousand separators.
Gettext's <libintl.h> provides overrides for printf and other functions
which do support the %2$s formats. Translations use them. But xz should
work on Windows without <libintl.h> too.
Fixes: 3e9177fd20
A slightly silly thing is that xz may now query the ABI version up to
three times. We could call my_landlock_ruleset_attr_forbid_all() only
once and cache the result but it didn't seem worth doing.
Now that we have the FALLTHROUGH macro, use the strictest mode with
GCC so that comment-based fallthrough markings are no longer accepted.
In GCC, -Wextra includes -Wimplicit-fallthrough=3 and
-Wimplicit-fallthrough is the same as -Wimplicit-fallthrough=3.
Thus, the strict mode requires specifying -Wimplicit-fallthrough=5.
Clang has -Wimplicit-fallthrough which is *not* enabled by -Wextra.
Clang doesn't have a variant that takes an argument. Thus we need
to check for -Wimplicit-fallthrough. Do it before checking for
-Wimplicit-fallthrough=5 so that the latter overrides the former
when using GCC.
Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.
The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.
Fixes: https://github.com/tukaani-project/xz/issues/150
Because this increases the Mach-O compatibility_version, this commit
shouldn't cause any ABI compatibility trouble for existing CMake users
on macOS. This is assuming that they won't later downgrade to an older
liblzma version that was built with CMake before this commit.
Meson allows customising the Mach-O versioning too. So the three
build systems can be configured to be compatible.
liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.
This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.
Closes: https://github.com/tukaani-project/xz/pull/158
See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.
Fixes: 46ee006162
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.
In the commit 78868b6e, the explanation in the commit message is wrong.
Fixes: 78868b6ed6
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.
Fixes: 20dfca8171
Call tuklib_mask_nonprint() on filenames and also on a few other
strings from the command line too.
The filename printed by "xz --robot --list" (in list.c) is also masked.
It's good to get rid of tabs and newlines which would desync the output
but masking other chars wouldn't be strictly necessary. It might matter
with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject
non-ASCII chars) and a script wants to read a filename from xz's output.
Hopefully it's an unusual enough corner case to not be a real problem.
Malicious filenames or other untrusted strings may affect the state of
the terminal when such strings are printed as part of (error) messages.
Add functions that mask such characters.
It's not enough to handle only single-byte control characters.
In multibyte locales, some control characters are multibyte too, for
example, terminals interpret C1 control characters (U+0080 to U+009F)
that are two bytes as UTF-8.
Instead of checking for control characters with iswcntrl(), this
uses iswprint() to detect printable characters. This is much stricter.
On Windows it's actually too strict as it rejects some characters that
definitely are printable.
Gnulib's quotearg would do a lot more but I hope this simpler method
is good enough here.
Thanks to Ryan Colyer for the discussion about the problems of
the earlier single-byte-only method.
Thanks to Christian Weisgerber for reporting a bug in an earlier
version of this code.
Thanks to Jeroen Roovers for a typo fix.
Closes: https://github.com/tukaani-project/xz/pull/118
Most of the auto-wrapped strings are translated already. A few
strings have changed since this was created though. This file
isn't in the Translation Project *yet* because these strings
are still very new.
Closes: https://github.com/tukaani-project/xz/pull/145