1
0
mirror of https://git.tukaani.org/xz.git synced 2025-02-23 17:08:13 +00:00

134 Commits

Author SHA1 Message Date
Lasse Collin
df399c5255
tuklib_mbstr_width: Add tuklib_mbstr_width_mem()
It's a new function split from tuklib_mbstr_width().
It's useful with partial strings that aren't terminated with \0.
2024-12-18 17:09:30 +02:00
Lasse Collin
51081efae4
tuklib_mbstr_width: Update a comment about shift states 2024-12-18 17:09:30 +02:00
Lasse Collin
7ff1b0ac53
tuklib_mbstr_width: Don't mention shift states in the API docs
It is assumed that this code won't be used with charsets that use
locking shift states.
2024-12-18 17:09:30 +02:00
Lasse Collin
3c16105936
tuklib_mbstr_width: Use stricter return value checking
This should make no difference in practice (at least if mbrtowc()
isn't broken).
2024-12-18 17:09:30 +02:00
Lasse Collin
b797c44c42
tuklib_mbstr_width: Change the behavior when wcwidth() is not available
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.

In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:30 +02:00
Lasse Collin
0d0b574cc4
Windows: Use UTF-8 locale when active code page is UTF-8
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:30 +02:00
Lasse Collin
20dfca8171
Windows: Document the need for setlocale(LC_ALL, ".UTF8")
Also warn about unpaired surrogates and (somewhat UTF-8-specific)
MAX_PATH issue in FindFirstFileA().

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:29 +02:00
Lasse Collin
46ee006162 Windows: Embed an application manifest in the EXE files
IMPORTANT: This includes a security fix to command line tool
           argument handling.

Some toolchains embed an application manifest by default to declare
UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11
to let the app access features newer than those of Vista.

We want all the above but also two more things:

  - Declare that the app is long path aware to support paths longer
    than 259 characters (this may also require a registry change).

  - Force the code page to UTF-8. This allows the command line tools
    to access files whose names contain characters that don't exist
    in the current legacy code page (except unpaired surrogates).
    The UTF-8 code page also fixes security issues in command line
    argument handling which can be exploited with malicious filenames.
    See the new file w32_application.manifest.comments.txt.

Thanks to Orange Tsai and splitline from DEVCORE Research Team
for discovering this issue.

Thanks to Vijay Sarvepalli for reporting the issue to me.

Thanks to Kelvin Lee for testing with MSVC and helping with
the required build system fixes.
2024-10-01 12:10:23 +03:00
Lasse Collin
8940ecb96f common_w32res.rc: White space edits
LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line
between them.
2024-09-29 01:27:16 +03:00
Lasse Collin
6a3c4aaa43 Windows: Drop Visual Studio 2013 support
This simplifies things a little. Building liblzma with VS2013 probably
still worked but building the command line tools was not supported.

Microsoft ended support for VS2013 on 2024-04.
2024-06-20 21:53:07 +03:00
Lasse Collin
c0e7eaae8d sysdefs.h: Add alignas 2024-06-16 12:59:20 +03:00
Lasse Collin
caea7844d3 tuklib: __STDC_VERSION__ in C23 is 202311 2024-06-10 23:19:27 +03:00
RainRat
9e73918a4f Fix typos
Closes: https://github.com/tukaani-project/xz/pull/124
2024-06-07 16:01:27 +03:00
Lasse Collin
04b23addf3 tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.

An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.

Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.

Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
2024-06-07 15:47:20 +03:00
Lasse Collin
4e9023857d Fix typos
Thanks to xx on #tukaani.
2024-05-18 00:34:07 +03:00
Lasse Collin
4ffc60f323 tuklib_integer: Rename bswapXX to byteswapXX
The __builtin_bswapXX from GCC and Clang are preferred when
they are available. This can allow compilers to emit the x86 MOVBE
instruction instead of doing a load + byteswap as two instructions
(which would happen if the byteswapping is done in inline asm).

bswap16, bswap32, and bswap64 exist in system headers on *BSDs
and Darwin. #defining bswap16 on NetBSD results in a warning about
macro redefinition. It's safest to avoid this namespace conflict
completely.

No OS supported by tuklib_integer.h uses byteswapXX names and
a web search doesn't immediately find any obvious danger of
namespace conflicts. So let's try these still-pretty-short names
for the macros.

Thanks to Sam James for pointing out the compiler warning on
NetBSD 10.0.
2024-04-25 14:00:57 +03:00
Lasse Collin
22af94128b Add SPDX license identifier into 0BSD source code files. 2024-02-14 18:31:16 +02:00
Lasse Collin
689e0228ba Change most public domain parts to 0BSD.
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt
were not touched.

COPYING.0BSD was added.
2024-02-14 18:31:12 +02:00
Jia Tan
6b05f827f5 tuklib_integer: Fix typo discovered by codespell.
Based on internet dictionary searches, 'choise' is an outdated spelling
of 'choice'.
2023-11-22 20:39:41 +08:00
Lasse Collin
dd32f628bb mythread.h: Make MYTHREAD_POSIX compatible with MinGW-w64's winpthreads.
This might be almost useless but it doesn't need much extra code either.
2023-10-22 18:59:45 +03:00
Lasse Collin
c8f715f1bc tuklib_integer: Revise unaligned reads and writes on strict-align archs.
In XZ Utils context this doesn't matter much because
unaligned reads and writes aren't used in hot code
when TUKLIB_FAST_UNALIGNED_ACCESS isn't #defined.
2023-10-18 19:02:45 +03:00
Lasse Collin
6828242735 tuklib_integer: Add missing write64be and write64le fallback functions. 2023-10-18 19:02:45 +03:00
Lasse Collin
e582f8e0fe tuklib_physmem: Comment out support for Windows versions older than 2000. 2023-09-24 17:48:13 +03:00
Lasse Collin
7d73d1f0e0 sysdefs.h: Update the comment about __USE_MINGW_ANSI_STDIO. 2023-09-24 16:32:32 +03:00
Lasse Collin
8c2d197c94 MSVC: #define inline and restrict only when needed.
This also drops the check for _WIN32 as that shouldn't be needed.
2023-09-22 20:06:27 +03:00
Lasse Collin
90c94dddfd tuklib: Update tuklib_attr_noreturn for C11/C17 and C23.
This makes no difference for GCC or Clang as they support
GNU C's __attribute__((__noreturn__)) but this helps with MSVC:

  - VS 2019 version 16.7 and later support _Noreturn if the
    options /std:c11 or /std:c17 are used. This gets handled
    with the check for __STDC_VERSION__ >= 201112.

  - When MSVC isn't in C11/C17 mode, __declspec(noreturn) is used.

C23 will deprecate _Noreturn (and <stdnoreturn.h>)
for [[noreturn]]. This commit anticipates that but
the final __STDC_VERSION__ value isn't known yet.
2023-09-22 20:06:21 +03:00
Jamaika1
6bf33b704c
mythread.h: Fix typo error in Vista threads mythread_once().
The "once_" variable was accidentally referred to as just "once". This
prevented building with Vista threads when
HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.
2023-08-08 20:07:59 +08:00
ChanTsune
81db3b8898 mythread.h: Disable signal functions in builds targeting Wasm + WASI.
signal.h in WASI SDK doesn't currently provide sigprocmask()
or sigset_t. liblzma doesn't need them so this change makes
liblzma and xzdec build against WASI SDK. xz doesn't build yet
and the tests don't either as tuktest needs setjmp() which
isn't (yet?) implemented in WASI SDK.

Closes: https://github.com/tukaani-project/xz/pull/57
See also: https://github.com/tukaani-project/xz/pull/56

(The original commit was edited a little by Lasse Collin.)
2023-08-01 18:18:05 +03:00
Jia Tan
9ad64bdf30 tuklib_integer.h: Reverts previous commit.
Previous commit 6be460dde07113fe3f08f814b61ddc3264125a96 would cause an
error if the integer size was 32 bit.
2023-05-04 20:30:25 +08:00
Jia Tan
6be460dde0 tuklib_integer.h: Changes two other UINT_MAX == UINT32_MAX to >=. 2023-05-04 19:25:20 +08:00
Lasse Collin
44c0c5eae9 tuklib_integer.h: Fix a recent copypaste error in Clang detection.
Wrong line was changed in 7062348bf35c1e4cbfee00ad9fffb4a21aa6eff7.
Also, this has >= instead of == since ints larger than 32 bits would
work too even if not relevant in practice.
2023-05-03 22:55:16 +03:00
Jia Tan
f41df2ac2f Windows: Include <intrin.h> when needed.
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.

GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
2023-04-19 22:22:16 +08:00
Jia Tan
7062348bf3 tuklib_integer: Use __builtin_clz() with Clang.
Clang has support for __builtin_clz(), but previously Clang would
fallback to either the MSVC intrinsic or the regular C code. This was
discovered due to a bug where a new version of Clang required the
<intrin.h> header file in order to use the MSVC intrinsics.

Thanks to Anton Kochkov for notifying us about the bug.
2023-04-19 21:59:03 +08:00
Lasse Collin
af5a4bd5af tuklib_physmem: Check for __has_warning before GCC version.
Clang can be configured to fake a too high GCC version so
this way it's more robust.
2023-01-26 17:39:46 +02:00
Lasse Collin
2f78ecc593 Revert "tuklib_common: Define __has_warning if it is not defined."
This reverts commit 82e3c968bfa10e3ff13333bd9cbbadb5988d6766.

Macros in the reserved namespace (_foo or __foo) shouldn't be #defined
without a very good reason. Here the alternative would have been
to #define tuklib_has_warning(str) to an approriate value.

Also the tuklib_* files should stay namespace clean if possible.
2023-01-24 20:20:51 +08:00
Lasse Collin
8366cf8738 tuklib_physmem: Clean up the way -Wcast-function-type is silenced on Windows.
__has_warning and other __has_foo macros are meant to become
compiler-agnostic so it's not good to check for __clang__ with it.

This also relied on tuklib_common.h for #defining __has_warning
which was confusing as #defining reserved macros is generally
not a good idea.
2023-01-24 20:20:40 +08:00
Jia Tan
b43ff180fb tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
2023-01-19 20:35:09 +08:00
Jia Tan
82e3c968bf tuklib_common: Define __has_warning if it is not defined.
clang supports the __has_warning macro to determine if the version of
clang compiling the code supports a given warning. If we do not define
it for other compilers, it may cause a preprocessor error.
2023-01-19 20:32:40 +08:00
Lasse Collin
b1a6d180a3 xz: Silence warnings from -Wsign-conversion in a 32-bit build. 2023-01-12 06:01:12 +02:00
Lasse Collin
37fbdfb726 liblzma: Silence a warning from -Wsign-conversion in a 32-bit build. 2023-01-12 04:46:45 +02:00
Lasse Collin
0b64215170 sysdefs.h: Don't include strings.h anymore.
On some platforms src/xz/suffix.c may need <strings.h> for
strcasecmp() but suffix.c includes the header when it needs it.

Unless there is an old system that otherwise supports enough C99
to build XZ Utils but doesn't have C89/C90-compatible <string.h>,
there should be no need to include <strings.h> in sysdefs.h.
2023-01-10 11:56:11 +02:00
Lasse Collin
7049c4a76c sysdefs.h: Fix a comment. 2023-01-10 10:05:13 +02:00
Lasse Collin
194a5fab69 sysdefs.h: Don't include memory.h anymore even if it were available.
It quite probably was never needed, that is, any system where memory.h
was required likely couldn't compile XZ Utils for other reasons anyway.

XZ Utils 5.2.6 and later source packages were generated using
Autoconf 2.71 which no longer defines HAVE_MEMORY_H. So the code
being removed is no longer used anyway.
2023-01-10 10:04:06 +02:00
Jia Tan
78e0561dfe Style: Change #if !defined() to #ifndef in mythread.h. 2023-01-06 20:43:31 +08:00
Jia Tan
bb740e3b11
Build: Only define HAVE_PROGRAM_INVOCATION_NAME if it is set to 1.
HAVE_DECL_PROGRAM_INVOCATION_NAME is renamed to
HAVE_PROGRAM_INVOCATION_NAME. Previously,
HAVE_DECL_PROGRAM_INVOCATION_NAME was always set when
building with autotools. CMake would only set this when it was 1, and the
dos/config.h did not define it. The new macro definition is consistent
across build systems.
2023-01-02 22:33:48 +08:00
Jia Tan
f82294c831 liblzma: Includes sys/time.h conditionally in mythread
Previously, <sys/time.h> was always included, even if mythread only used
clock_gettime. <time.h> is still needed even if clock_gettime is not used
though because struct timespec is needed for mythread_condtime.
2022-12-30 23:34:31 +08:00
Jia Tan
74dae7d300 Build: No longer require HAVE_DECL_CLOCK_MONOTONIC to always be set.
Previously, if threading was enabled HAVE_DECL_CLOCK_MONOTONIC would always
be set to 0 or 1. However, this macro was needed in xz so if xz was not
built with threading and HAVE_DECL_CLOCK_MONOTONIC was not defined but
HAVE_CLOCK_GETTIME was, it caused a warning during build. Now,
HAVE_DECL_CLOCK_MONOTONIC has been renamed to HAVE_CLOCK_MONOTONIC and
will only be set if it is 1.
2022-12-30 23:34:31 +08:00
Lasse Collin
e53e0e2186 Windows: Fix mythread_once() macro with Vista threads.
Don't call InitOnceComplete() if initialization was already done.

So far mythread_once() has been needed only when building
with --enable-small. windows/build.bash does this together
with --disable-threads so the Vista-specific mythread_once()
is never needed by those builds. VS project files or
CMake-builds don't support HAVE_SMALL builds at all.
2022-10-31 13:31:58 +02:00
Lasse Collin
2611c4d905 tuklib_cpucores: Use HW_NCPUONLINE on OpenBSD.
On OpenBSD the number of cores online is often less
than what HW_NCPU would return because OpenBSD disables
simultaneous multi-threading (SMT) by default.

Thanks to Christian Weisgerber.
2022-10-20 20:22:50 +03:00
Lasse Collin
fae37ad2af tuklib_integer: Add 64-bit endianness-converting reads and writes.
Also update the comment in liblzma's memcmplen.h.

Thanks to Michał Górny for the original patch for the reads.
2022-10-05 14:26:00 +03:00