1
0
mirror of https://git.tukaani.org/xz.git synced 2025-10-25 10:32:52 +00:00

2724 Commits

Author SHA1 Message Date
Lasse Collin
e34dbd6a0a
xzdec: Fix language in a comment 2025-01-02 15:43:37 +02:00
Lasse Collin
16821252c5
Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1
Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.
2025-01-02 15:35:25 +02:00
Lasse Collin
aa1807ed94
windows/build-with-cmake.bat: Fix ENABLE_NLS to XZ_NLS
Fixes: 29f77c7b707f2458fb047e77497354b195e05b14
2025-01-02 15:35:16 +02:00
Lasse Collin
ea21c76aa2
Build: Use git log --pretty=medium when creating ChangeLog
It's the default in git-log. Specifying it explicitly is good in case
a user has set format.pretty to a different value.
2024-12-30 11:21:57 +02:00
Lasse Collin
08050c0788
Windows: Update MinGW-w64 + CMake instructions to recommend UCRT 2024-12-30 10:51:33 +02:00
Lasse Collin
653732bd6f
xz man page: Describe the source file deletion in -z and -d options
The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.

Fixes: https://github.com/tukaani-project/xz/issues/150
2024-12-30 10:51:26 +02:00
Lasse Collin
bb79f79b27
Build: Set libtool -version-info so that it matches with CMake
In the past, they haven't been in sync in development versions
although they (of course) have been in stable releases.
2024-12-29 10:54:45 +02:00
Lasse Collin
cf54f70e14
CMake/macOS: Use GNU Libtool compatible shared library versioning
Because this increases the Mach-O compatibility_version, this commit
shouldn't cause any ABI compatibility trouble for existing CMake users
on macOS. This is assuming that they won't later downgrade to an older
liblzma version that was built with CMake before this commit.

Meson allows customising the Mach-O versioning too. So the three
build systems can be configured to be compatible.
2024-12-29 10:51:53 +02:00
Lasse Collin
94e1791668
CMake: Edit a comment 2024-12-29 10:51:53 +02:00
Lasse Collin
6b50590725
version.sh: Omit an unwanted dot from development versions
It printed 5.7.0.alpha instead of 5.7.0alpha.

Fixes: e7a42cda7c827e016619e8cab15e2faf5d4181ae
2024-12-29 10:51:47 +02:00
Lasse Collin
f7a248f56e
CMake: Remove a duplicate word from a comment 2024-12-27 21:39:28 +02:00
Lasse Collin
8b7c55d148
INSTALL: Document CMAKE_DLL_NAME_WITH_SOVERSION 2024-12-27 21:39:22 +02:00
Lasse Collin
260d5d3620
xz: Fix comments 2024-12-27 09:14:56 +02:00
Dexter Castor Döpping
bf6da9a573
CMake: Disable unity builds project-wide
liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.

This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.

Closes: https://github.com/tukaani-project/xz/pull/158
2024-12-22 20:06:24 +02:00
Lasse Collin
f8c328eed1
Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale()
See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-20 16:33:34 +02:00
Lasse Collin
0353390609
Revert "Windows: Use UTF-8 locale when active code page is UTF-8"
This reverts commit 0d0b574cc45045d6150d397776340c068df59e2a.
2024-12-20 16:33:34 +02:00
Lasse Collin
4b319e05af
xzdec: Use setlocale() instead of tuklib_gettext_setlocale()
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.

In the commit 78868b6e, the explanation in the commit message is wrong.

Fixes: 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db
2024-12-20 16:33:34 +02:00
Lasse Collin
34b80e282e
Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.

Fixes: 20dfca8171dad4c64785ac61d5b68972c444877b
2024-12-20 16:33:28 +02:00
Lasse Collin
5794cda064
tuklib_mbstr_wrap: Silence a warning from Clang
Fixes: ca529c3f41a4a19a59e2e252e6dd9255f130c634
2024-12-18 17:50:58 +02:00
Lasse Collin
16c9796ef9
Update THANKS 2024-12-18 17:09:32 +02:00
Lasse Collin
3b5c8a1fca
Update TODO
Fixes: 5f6dddc6c911df02ba660564e78e6de80947c947
2024-12-18 17:09:32 +02:00
Lasse Collin
22a35e64ce
lzmainfo: Use tuklib_mbstr_nonprint 2024-12-18 17:09:32 +02:00
Lasse Collin
03111595ee
xzdec: Use tuklib_mbstr_nonprint 2024-12-18 17:09:32 +02:00
Lasse Collin
d22f96921f
xz: Use tuklib_mbstr_nonprint
Call tuklib_mask_nonprint() on filenames and also on a few other
strings from the command line too.

The filename printed by "xz --robot --list" (in list.c) is also masked.
It's good to get rid of tabs and newlines which would desync the output
but masking other chars wouldn't be strictly necessary. It might matter
with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject
non-ASCII chars) and a script wants to read a filename from xz's output.
Hopefully it's an unusual enough corner case to not be a real problem.
2024-12-18 17:09:32 +02:00
Lasse Collin
40e5733055
Add tuklib_mbstr_nonprint to mask non-printable characters
Malicious filenames or other untrusted strings may affect the state of
the terminal when such strings are printed as part of (error) messages.
Add functions that mask such characters.

It's not enough to handle only single-byte control characters.
In multibyte locales, some control characters are multibyte too, for
example, terminals interpret C1 control characters (U+0080 to U+009F)
that are two bytes as UTF-8.

Instead of checking for control characters with iswcntrl(), this
uses iswprint() to detect printable characters. This is much stricter.
On Windows it's actually too strict as it rejects some characters that
definitely are printable.

Gnulib's quotearg would do a lot more but I hope this simpler method
is good enough here.

Thanks to Ryan Colyer for the discussion about the problems of
the earlier single-byte-only method.

Thanks to Christian Weisgerber for reporting a bug in an earlier
version of this code.

Thanks to Jeroen Roovers for a typo fix.

Closes: https://github.com/tukaani-project/xz/pull/118
2024-12-18 17:09:32 +02:00
Lasse Collin
36190c8c4b
Translations: Add preliminary Georgian translation
Most of the auto-wrapped strings are translated already. A few
strings have changed since this was created though. This file
isn't in the Translation Project *yet* because these strings
are still very new.

Closes: https://github.com/tukaani-project/xz/pull/145
2024-12-18 17:09:31 +02:00
Lasse Collin
4a0c4f92b8
xz: Make one string simpler for translators
Leading spaces in the string can get miscounted by translators.
2024-12-18 17:09:31 +02:00
Lasse Collin
3fcf547e92
lzmainfo: Sync the translatable strings with xz 2024-12-18 17:09:31 +02:00
Lasse Collin
3e9177fd20
xz: Use automatic word wrapping for help texts
--long-help is now one line longer because --lzma1 is now on its
own line.
2024-12-18 17:09:31 +02:00
Lasse Collin
a0eecc9eb2
po/Makevars: Add --keyword=W_:... to XGETTEXT_OPTIONS
The text was copied from tuklib_gettext.h.

Also rearrange the --keyword options to be last on the line.
2024-12-18 17:09:31 +02:00
Lasse Collin
ca529c3f41
Add tuklib_mbstr_wrap for automatic word wrapping
Automatic word wrapping makes translators' work easier and reduces
errors like misaligned columns or overlong lines. Right-to-left
languages and languages that don't use spaces between words will
still need extra effort. (xz hasn't been translated to any RTL
language so far.)
2024-12-18 17:09:31 +02:00
Lasse Collin
314b83ceba
Build: Sort filenames to ASCII order in Makefile.am 2024-12-18 17:09:31 +02:00
Lasse Collin
df399c5255
tuklib_mbstr_width: Add tuklib_mbstr_width_mem()
It's a new function split from tuklib_mbstr_width().
It's useful with partial strings that aren't terminated with \0.
2024-12-18 17:09:30 +02:00
Lasse Collin
51081efae4
tuklib_mbstr_width: Update a comment about shift states 2024-12-18 17:09:30 +02:00
Lasse Collin
7ff1b0ac53
tuklib_mbstr_width: Don't mention shift states in the API docs
It is assumed that this code won't be used with charsets that use
locking shift states.
2024-12-18 17:09:30 +02:00
Lasse Collin
3c16105936
tuklib_mbstr_width: Use stricter return value checking
This should make no difference in practice (at least if mbrtowc()
isn't broken).
2024-12-18 17:09:30 +02:00
Lasse Collin
b797c44c42
tuklib_mbstr_width: Change the behavior when wcwidth() is not available
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.

In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:30 +02:00
Lasse Collin
78868b6ed6
xzdec: Use setlocale() via tuklib_gettext_setlocale()
xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:

  - Without any setlocale() call, non-ASCII filenames don't display
    properly in Command Prompt unless one first uses "chcp 65001"
    to set the console code page to UTF-8.

  - setlocale(LC_ALL, "") is enough to make non-ASCII filenames
    print correctly in Command Prompt without using "chcp 65001",
    assuming that the non-UTF-8 code page (like 850) supports
    those non-ASCII characters.

  - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
    such functions use an UTF-8 locale instead of a legacy code page.
    The tuklib_gettext_setlocale() macro takes care of this (without
    enabling any translations).

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:30 +02:00
Lasse Collin
0d0b574cc4
Windows: Use UTF-8 locale when active code page is UTF-8
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:30 +02:00
Lasse Collin
20dfca8171
Windows: Document the need for setlocale(LC_ALL, ".UTF8")
Also warn about unpaired surrogates and (somewhat UTF-8-specific)
MAX_PATH issue in FindFirstFileA().

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
2024-12-18 17:09:29 +02:00
Lasse Collin
4e936f2340
xzdec: Call tuklib_progname_init() early enough
If the early pledge() call on OpenBSD fails, it calls my_errorf()
which requires the "progname" variable.

Fixes: d74fb5f060b76db709b50f5fd37490394e52f975
2024-12-18 17:09:29 +02:00
Lasse Collin
61feaf681b
CMake: Bump maximum policy version to 3.31
With CMake 3.31, there were a few warnings from
CMP0177 "install() DESTINATION paths are normalized".
These occurred because the install(FILES) command in
my_install_man_lang() is called with a DESTINATION path
that contains two consecutive slashes, for example,
"share/man//man1". Such a path is for the English man pages.
With translated man pages, the language code goes between
the slashes. The warning was probably triggered because the
extra slash gets removed by the normalization.
2024-12-18 17:09:29 +02:00
Lasse Collin
b0bb84dd7b
Update THANKS 2024-12-18 17:09:29 +02:00
Dexter Castor Döpping
bee0c044d3
liblzma: Fix incorrect macro name in a comment
Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559
Closes: https://github.com/tukaani-project/xz/pull/155
2024-12-18 17:09:29 +02:00
Lasse Collin
2cfa1ad0a9
license-check.sh: Add an exception for doc/SHA256SUMS
Fixes: 36b531022f24a2ab57a2dfb9e5052f1c176e9d9a
2024-12-18 17:09:21 +02:00
Lasse Collin
36b531022f
doc/SHA256SUMS: Add the list of SHA-256 hashes of release files
The release files are signed but verifying the signatures cannot
catch certain types of attacks:

1. A malicious maintainer could make more than one variant of
   a package. One could be for general distribution. Another
   with malicious content could be targeted to specific users,
   for example, distributing the malicious version on a mirror
   controlled by the attacker.

2. If the signing key of an honest maintainer was compromised
   without being detected, a similar situation as described
   above could occur.

SHA256SUMS could be put on the project website but having it in
the Git repository makes it obvious that old lines aren't modified
when the file is updated.

Hashes of uncompressed files are included too. This way tarballs
can be recompressed and the hashes can still be verified.
2024-12-01 21:38:17 +02:00
Lasse Collin
fe9e66993f Docs: Remove .github/SECURITY.md
One of the reasons to have this file in the xz repository was to
show vulnerability reporting info in the Security section on GitHub.
On 2024-11-25, I added SECURITY.md to the tukaani-project organization
on GitHub:

    https://github.com/tukaani-project/.github/blob/main/SECURITY.md

GitHub shows that file in all projects in the organization unless
overridden by a project-specific SECURITY.md. Thus, removing
the file from the xz repo makes GitHub show the organization-wide
text instead.

Maintaining a single copy for the whole GitHub organization makes
things simpler. It's also nicer to have fewer GitHub-specific files
in the xz repo. Information how to report bugs (including security
issues) is available in README and on the home page too.

The OpenSSF Scorecard tool didn't find .github/SECURITY.md from the
xz repository. There was a suggestion to move the file to the top-level
directory where Scorecard should find it. However, Scorecard does find
the organization-wide SECURITY.md. Thus, the file isn't needed in the
xz repository to score points in the Scorecard game:

    https://scorecard.dev/viewer/?uri=github.com/tukaani-project/xz

Closes: https://github.com/tukaani-project/xz/issues/148
Closes: https://github.com/tukaani-project/xz/pull/149
2024-11-30 12:05:59 +02:00
Lasse Collin
b361772736 Translations: Update the Chinese (traditional) translation 2024-11-30 10:27:14 +02:00
Lasse Collin
c15115f7ed liblzma: Optimize the loop conditions in BCJ filters
Compilers cannot optimize the addition "i + 4" away since theoretically
it could overflow.
2024-11-26 19:17:42 +02:00
Lasse Collin
9f69e71e78 Update THANKS 2024-11-25 16:26:54 +02:00