Commit Graph

1360 Commits

Author SHA1 Message Date
Lasse Collin 8a7d922fb8
Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale()
See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.

Fixes: 46ee006162
(cherry picked from commit b40e3321a7fb9dfdf8ffb30e7e0788c2f0abc941)
2024-12-20 16:35:13 +02:00
Lasse Collin dcdd40cacc
Revert "Windows: Use UTF-8 locale when active code page is UTF-8"
This reverts commit 0d0b574cc4.

(cherry picked from commit bc4165da92b56668ddd1b7014b3488a0fad1733a)
2024-12-20 16:35:13 +02:00
Lasse Collin f8e42ed44d
xzdec: Use setlocale() instead of tuklib_gettext_setlocale()
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.

In the commit 78868b6e, the explanation in the commit message is wrong.

Fixes: 78868b6ed6
(cherry picked from commit d6796f9ce5359faaaed82926c1735aee3694430f)
2024-12-20 16:35:13 +02:00
Lasse Collin 3ed40b9f87
Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.

Fixes: 20dfca8171
(cherry picked from commit e607329a615759f1519016595dd38df7c89208f2)
2024-12-20 16:35:12 +02:00
Lasse Collin 4e0ebbabe4
tuklib_mbstr_width: Change the behavior when wcwidth() is not available
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.

In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.

Fixes: 46ee006162
(cherry picked from commit b797c44c42)
2024-12-18 19:22:01 +02:00
Lasse Collin 4ff609adb0
xzdec: Use setlocale() via tuklib_gettext_setlocale()
xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:

  - Without any setlocale() call, non-ASCII filenames don't display
    properly in Command Prompt unless one first uses "chcp 65001"
    to set the console code page to UTF-8.

  - setlocale(LC_ALL, "") is enough to make non-ASCII filenames
    print correctly in Command Prompt without using "chcp 65001",
    assuming that the non-UTF-8 code page (like 850) supports
    those non-ASCII characters.

  - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
    such functions use an UTF-8 locale instead of a legacy code page.
    The tuklib_gettext_setlocale() macro takes care of this (without
    enabling any translations).

Fixes: 46ee006162
(cherry picked from commit 78868b6ed6)
2024-12-18 19:22:00 +02:00
Lasse Collin 4e7a48bf15
Windows: Use UTF-8 locale when active code page is UTF-8
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.

Fixes: 46ee006162
(cherry picked from commit 0d0b574cc4)
2024-12-18 19:22:00 +02:00
Lasse Collin d20e4115e1
Windows: Document the need for setlocale(LC_ALL, ".UTF8")
Also warn about unpaired surrogates and (somewhat UTF-8-specific)
MAX_PATH issue in FindFirstFileA().

Fixes: 46ee006162
(cherry picked from commit 20dfca8171)
2024-12-18 19:22:00 +02:00
Lasse Collin f9f0cdae8a
xzdec: Call tuklib_progname_init() early enough
If the early pledge() call on OpenBSD fails, it calls my_errorf()
which requires the "progname" variable.

Fixes: d74fb5f060
(cherry picked from commit 4e936f2340)
2024-12-18 19:22:00 +02:00
Dexter Castor Döpping d86fa15b72
liblzma: Fix incorrect macro name in a comment
Fixes: 33b8a24b66
Closes: https://github.com/tukaani-project/xz/pull/155
(cherry picked from commit bee0c044d3)
2024-12-18 19:22:00 +02:00
Mark Wielaard d9c2e7572b
xz: Landlock: Fix a file descriptor leak
(cherry picked from commit 48ff3f0652)
2024-12-18 19:21:59 +02:00
Lasse Collin 9331ce4009
Bump version and soname for 5.6.3 2024-10-01 12:50:28 +03:00
Lasse Collin bf518b9ba4
Windows: Embed an application manifest in the EXE files
IMPORTANT: This includes a security fix to command line tool
           argument handling.

Some toolchains embed an application manifest by default to declare
UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11
to let the app access features newer than those of Vista.

We want all the above but also two more things:

  - Declare that the app is long path aware to support paths longer
    than 259 characters (this may also require a registry change).

  - Force the code page to UTF-8. This allows the command line tools
    to access files whose names contain characters that don't exist
    in the current legacy code page (except unpaired surrogates).
    The UTF-8 code page also fixes security issues in command line
    argument handling which can be exploited with malicious filenames.
    See the new file w32_application.manifest.comments.txt.

Thanks to Orange Tsai and splitline from DEVCORE Research Team
for discovering this issue.

Thanks to Vijay Sarvepalli for reporting the issue to me.

Thanks to Kelvin Lee for testing with MSVC and helping with
the required build system fixes.

(cherry picked from commit 46ee006162)
2024-10-01 12:16:29 +03:00
Lasse Collin 5718ce932e
Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2
Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.

(cherry picked from commit dad1530915)
2024-10-01 12:16:28 +03:00
Lasse Collin e77c0ca61d
common_w32res.rc: White space edits
LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line
between them.

(cherry picked from commit 8940ecb96f)
2024-10-01 12:16:28 +03:00
Tobias Stoeckmann aef9a25b32
lzmainfo: Avoid integer overflow
The MB output can overflow with huge numbers. Most likely these are
invalid .lzma files anyway, but let's avoid garbage output.

lzmadec was adapted from LZMA Utils. The original code with this bug
was written in 2005, over 19 years ago.

Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/144
(cherry picked from commit 76cfd0a9bb)
2024-09-18 20:53:11 +03:00
Tobias Stoeckmann 40a7f163f5
xzdec: Remove unused short option -M
"xzdec -M123" exited with exit status 1 without printing
any messages. The "M:" entry should have been removed when
the memory usage limiter support was removed from xzdec.

Fixes: 792331bdee
Closes: https://github.com/tukaani-project/xz/pull/143
[ Lasse: Commit message edits ]

(cherry picked from commit 78355aebb7)
2024-09-18 20:53:11 +03:00
Yifeng Li 3a4a05d75e
liblzma: Fix x86-64 movzw compatibility in range_decoder.h
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:

    lzma_decoder.c: Assembler messages:
    lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
    lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
    lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
    lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'

Change "movzw" to "movzwl" for compatibility.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2

Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
(cherry picked from commit 6cd7c86078)
2024-09-06 19:33:20 +03:00
Lasse Collin 9edddda563
liblzma: Tweak a comment
(cherry picked from commit 7c292dd0bf)
2024-09-06 19:33:20 +03:00
Lasse Collin 0f47db18d0
xz: Remove the TODO comment about --recursive
It won't be implemented. find + xargs is more flexible, for example,
it allows compressing small files in parallel. An example for that
has been included in the xz man page since 2010.

(cherry picked from commit baecfa1426)
2024-09-06 19:31:12 +03:00
Lasse Collin ff697eb154
liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.

Thanks to Ilya Kurdyukov.

(cherry picked from commit 30a2d5d510)
2024-09-06 19:00:30 +03:00
Lasse Collin a44493ec41
xz: Fix white space
(cherry picked from commit c7164b1927)
2024-09-06 18:56:17 +03:00
Lasse Collin 5e74a6a813
liblzma: Fix a typo in a comment
Thanks to Sam James for spotting it.

Fixes: f644473a21
(cherry picked from commit 0a32d2072c)
2024-09-06 18:56:17 +03:00
Lasse Collin 3f7edc673c
liblzma: Fix a comment indentation
(cherry picked from commit afd9b4d282)
2024-09-06 18:56:16 +03:00
Lasse Collin 8a9cc7ca08
liblzma: Fix white space
(cherry picked from commit 50e6bff274)
2024-09-06 18:56:16 +03:00
RainRat b29b13082f
Fix typos
Closes: https://github.com/tukaani-project/xz/pull/124
(cherry picked from commit 9e73918a4f)
2024-09-06 18:51:59 +03:00
Lasse Collin 6f66155e01
tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.

An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.

Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.

Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
(cherry picked from commit 04b23addf3)
2024-09-06 18:51:59 +03:00
Sam James dc6b6011b4
xz: list: suppress -Wformat-nonliteral for Solaris
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
  uint64_to_str(totals.files, 0));
  ^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```

It's presumably because of older gettext missing format attributes.

This is with `gcc (GCC) 7.3.0`.

(cherry picked from commit b69768c8bd)
2024-09-06 18:51:56 +03:00
Lasse Collin 3ec664d3f6 Bump version and soname for 5.6.2 2024-05-29 18:03:51 +03:00
Lasse Collin 8fda5ce872 Fix typos
Thanks to xx on #tukaani.

(cherry picked from commit 4e9023857d)
2024-05-23 11:36:05 +03:00
Lasse Collin 2729079bcb liblzma: Fix white space
Thanks to xx on #tukaani.

(cherry picked from commit b14d08fbbc)
2024-05-23 11:36:05 +03:00
Lasse Collin a289c4dfeb xz: Document the static function get_chains_memusage()
(cherry picked from commit 142e670a41)
2024-05-23 11:28:20 +03:00
Lasse Collin 6f0db31713 xz: Rename filters_memusage_max() to get_chains_memusage()
(cherry picked from commit 78e984399a)
2024-05-23 11:28:20 +03:00
Lasse Collin d7e2bf7e2d xz: Rename filter_memusages to chains_memusages
(cherry picked from commit 54c3db0a83)
2024-05-23 11:28:20 +03:00
Lasse Collin 58f200b6d1 xz: Simplify the memory usage scaling code
This is closer to what it was before the --filtersX support was added,
just extended to support for scaling all filter chains. The method
before this commit was an extended version of the original too but
it was done in a more complex way for no clear reason. In case of
an error, the complex version printed fewer informative messages
(a good thing) but it's not a sigificant benefit.

In the limit is too low even for single-threaded mode, the required
amount of memory is now reported like in 5.4.x instead of like in
5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It
had been a FIXME in the old code but it's not clear what message
makes the most sense.

Fixes: 5f0c5a0438
(cherry picked from commit d9e1ae79ec)
2024-05-23 11:28:20 +03:00
Lasse Collin 41bdc9fa5c xz: Edit comments
(cherry picked from commit 0ee56983d1)
2024-05-23 11:28:20 +03:00
Lasse Collin 52e40c1912 xz: Rename chain_idx to chain_num
(cherry picked from commit ec82a49c35)
2024-05-23 11:28:20 +03:00
Lasse Collin 8a01963331 xz: Edit coding style
(cherry picked from commit a731a6993c)
2024-05-23 11:28:20 +03:00
Lasse Collin e3ad7eda74 xz: Edit comments
Fixes: 5f0c5a0438
(cherry picked from commit 32eb176b89)
2024-05-23 11:28:20 +03:00
Lasse Collin 09cabae2ab xz: Fix grammar in a comment
Fixes: cb3111e3ed
(cherry picked from commit b90339f4da)
2024-05-23 11:28:20 +03:00
Lasse Collin c10b66fbf9 xz: Rename filter_memusages to encoder_memusages
(cherry picked from commit 4c0bdaf13d)
2024-05-23 11:28:20 +03:00
Lasse Collin 9132ce3564 xz: Edit coding style
(cherry picked from commit b54aa023e0)
2024-05-23 11:28:20 +03:00
Lasse Collin d642e13874 xz: Rename filters_index to chain_num
The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68.

(cherry picked from commit 49f67d3d3f)
2024-05-23 11:28:20 +03:00
Lasse Collin 47599f3b73 xz: Replace a few uint32_t with "unsigned" to reduce the number of casts
These hold only tiny values.

(cherry picked from commit ff9e8b3d06)
2024-05-23 11:28:20 +03:00
Lasse Collin 8f5ab75c45 xz: Rename filters_used_mask to chains_used_mask
The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68.

(cherry picked from commit b5e6c1113b)
2024-05-23 11:28:20 +03:00
Lasse Collin 3eb7cf9dd5 xz: Move the setting of "check" in coder_set_compression_settings()
It's more logical to do it in the beginning instead of in the middle
of the filter chain handling.

Fixes: d6af7f3470
(cherry picked from commit 32500dfaad)
2024-05-23 11:28:20 +03:00
Lasse Collin 067961ee0e xz: Rename "filters" to "chains"
The convention is that

    lzma_filter filters[LZMA_FILTERS_MAX + 1];

contains the filters of a single filter chain.
It was so here as well before the commit
d6af7f3470.
It changes "filters" to a ten-element array of filter chains.
It's clearer to call this array-of-arrays "chains".

This also renames "filter_idx" to "chain_idx" which is used
as an index as in chains[chain_idx].

(cherry picked from commit ad146b1f42)
2024-05-23 11:28:20 +03:00
Lasse Collin 6822f6f891 xz: Clean up a comment
(cherry picked from commit 5a4ae4e4d0)
2024-05-23 11:28:20 +03:00
Lasse Collin 0e5e3e7bdc xz: Add clarifying assertions
(cherry picked from commit 2de80494ed)
2024-05-23 11:28:20 +03:00
Lasse Collin 77bcf6b76a xz: Add a clarifying assertion
Fixes: 5f0c5a0438
(cherry picked from commit 1eaad004bf)
2024-05-23 11:28:20 +03:00