Reported-by: Diego Elio Pettenò <flameeyes@gentoo.org>
Signed-off-by: Martin Väth <vaeth@mathematik.uni-wuerzburg.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Spot candidates by running these commands:
git ls-files |xargs perl -0777 -n \
-e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims)' \
-e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g; print "$ARGV:$n:$v\n"}'
Thanks to Jim Meyering for the original patch.
Empty Block was created if the input buffer was empty.
Empty Block wastes a few bytes of space, but more importantly
it triggers a bug in XZ Utils 5.0.1 and older when trying
to decompress such a file. 5.0.1 and older consider such
files to be corrupt. I thought that no encoder creates empty
Blocks when releasing 5.0.2 but I was wrong.
The biggest problem was that the integrity check type
wasn't validated, and e.g. lzma_easy_buffer_encode()
would create a corrupt .xz Stream if given an unsupported
Check ID. Luckily applications don't usually try to use
an unsupport Check ID, so this bug is unlikely to cause
many real-world problems.
It leaks old filter options structures (hundred bytes or so)
every time the lzma_stream is reinitialized. With the xz tool,
this happens when compressing multiple files.
The decoder considered empty LZMA2 streams to be corrupt.
This shouldn't matter much with .xz files, because no encoder
creates empty LZMA2 streams in .xz. This bug is more likely
to cause problems in applications that use raw LZMA2 streams.
It didn't work at all. It tried to use the -q option
for grep, but it appended it after "--". This works
around it by redirecting to /dev/null. The downside
is that this can be slower with big files compared
to proper use of "grep -q".
Thanks to Gregory Margo.
xz didn't compress setuid/setgid/sticky files and files
with multiple hard links even with --force. This bug was
introduced in 23ac2c44c3.
Thanks to Charles Wilson.
If any of the reserved members in lzma_stream are non-zero
or non-NULL, LZMA_OPTIONS_ERROR is returned. It is possible
that a new feature in the future is indicated by just setting
a reserved member to some other value, so the old liblzma
version need to catch it as an unsupported feature.
The non-standard ones from msvcrt.dll appear to work
most of the time with XZ Utils, but there are some
corner cases where things may go very wrong. So it's
good to use the better replacements provided by
MinGW(-w64) runtime.
Calling raise() to kill xz when user has pressed C-c
is a bit verbose on OS/2 and DOS/DJGPP. Instead of
calling raise(), set only the exit status to 1.
Those are the same thing, and the former makes it a bit
easier to build the code with other build systems, because
one doesn't need to update the version number into custom
config.h.
This change affects only lzmainfo. Other tools were already
using LZMA_VERSION_STRING.
Most distros want xz linked against shared liblzma, so
it doesn't help much to require --enable-dynamic for that.
Those who want to avoid PIC on x86-32 to get better
performance, can still do it e.g. by using --disable-shared
to compile xz and then another pass to compile shared liblzma.
Part of these static/dynamic tricks were needed for Windows
in the past. Nowadays we rely on GCC and binutils to do the
right thing with auto-import. If the Autotooled build system
needs to support some other toolchain on Windows in the future,
this may need some rethinking.
Lots of content was updated on the xz man page.
Technical improvements:
- Start a new sentence on a new line.
- Use fairly short lines.
- Use constant-width font for examples (where supported).
- Some minor cleanups.
Thanks to Jonathan Nieder for some language fixes.
The code assumed that printing numbers with thousand separators
and decimal points would always produce only US-ASCII characters.
This was used for buffer sizes (with snprintf(), no overflows)
and aligning columns of the progress indicator and --list. That
assumption was wrong (e.g. LC_ALL=fi_FI.UTF-8 with glibc), so
multibyte character support was added in this commit. The old
way is used if the operating system doesn't have enough multibyte
support (e.g. lacks wcwidth()).
The sizes of buffers were increased to accomodate multibyte
characters. I don't know how big they should be exactly, but
they aren't used for anything critical, so it's not too bad.
If they still aren't big enough, I hopefully get a bug report.
snprintf() takes care of avoiding buffer overflows.
Some static buffers were replaced with buffers allocated on
stack. double_to_str() was removed. uint64_to_str() and
uint64_to_nicestr() now share the static buffer and test
for thousand separator support.
Integrity check names "None" and "Unknown-N" (2 <= N <= 15)
were marked to be translated. I had forgot these, plus they
wouldn't have worked correctly anyway before this commit,
because printing tables with multibyte strings didn't work.
Thanks to Marek Černocký for reporting the bug about
misaligned table columns in --list output.
This should reduce the cases where --extreme makes
compression worse. On the other hand, some other
files may now benefit slightly less from --extreme.
It was 8 + nice_len / 4, now it is 4 + nice_len / 4.
This allows faster settings at lower nice_len values,
even though it seems that I won't use automatic depth
calcuation with HC3 and HC4 in the presets.
For several people, the limiter causes bigger problems that
it solves, so it is better to have it disabled by default.
Those who want to have a limiter by default need to enable
it via the environment variable XZ_DEFAULTS.
Support for environment variable XZ_DEFAULTS was added. It is
parsed before XZ_OPT and technically identical with it. The
intended uses differ quite a bit though; see the man page.
The memory usage limit can now be set separately for
compression and decompression using --memlimit-compress and
--memlimit-decompress. To set both at once, -M or --memlimit
can be used. --memory was retained as a legacy alias for
--memlimit for backwards compatibility.
The semantics of --info-memory were changed in backwards
incompatible way. Compatibility wasn't meaningful due to
changes in the memory usage limiter functionality.
The memory usage limiter info is no longer shown at the
bottom of xz --long -help.
The memory usage limiter support for removed completely from xzdec.
xz's man page was updated to match the above changes. Various
unrelated fixes were also made to the man page.
When using -O2 with GCC, it liked to swap two comparisons
in one "if" statement. It's otherwise fine except that
the latter part, which is seemingly never executed, got
executed (nothing wrong with that) and then triggered
warning in Valgrind about conditional jump depending on
uninitialized variable. A few people find this annoying
so do things a bit differently to avoid the warning.
message_filters_to_str() converts the filter chain to
a string. message_filters_show() replaces the original
message_filters().
uint32_to_optstr() was also added to show the dictionary
size in nicer format when possible.
Don't use #error to generate compile error, because some
compilers actually don't take it as an error. This fixes
tuklib_physmem on IRIX.
Fix incorrect error check for sysconf() return values.
Add AIX, HP-UX, and Tru64 specific code to detect the
amount RAM.
Add HP-UX specific code to detect the number of CPU cores.
Thanks a lot to Peter O'Gorman for initial patches,
testing, and debugging these fixes.
The extra space for showing both has been taken from the
sizes field. If the sizes grow big, bigger units than MiB
will be used. It makes it slightly difficult to see that
progress is still happening with huge files, but it should
be OK in practice.
Thanks to Trent W. Buck for <http://bugs.debian.org/574583>
and Jonathan Nieder for suggestions how to fix it.
Originally both base-2 and base-10 were supported, but since
there seems to be little need for base-10 in XZ Utils, treat
everything as base-2 and also be more relaxed about the case
of the first letter of the suffix. Now xz will accept e.g.
KiB, Ki, k, K, kB, and KB, and interpret them all as 1024. The
recommended spelling of the suffixes are still KiB, MiB, and GiB.
It still feels a bit wrong to round 1 byte to 1 MiB but
at least it is now done consistently so that the same
byte value is always rounded the same way to MiB.
Previously the default limit was always 40 % of RAM. The
new limit is a little bit more complex:
- If 40 % of RAM is at least 80 MiB, 40 % of RAM is used
as the limit.
- If 80 % of RAM is over 80 MiB, 80 MiB is used as the limit.
- Otherwise 80 % of RAM is used as the limit.
This should make it possible to decompress files created with
"xz -9" on more systems. Swapping is generally more expected
on systems with less RAM, so higher default limit on them
shouldn't cause too bad surprises in terms of heavy swapping.
Instead, the higher default limit should reduce the number of
bad surprises when it used to prevent decompression of files
created with "xz -9". The DoS prevention system shouldn't be
a DoS itself.
Note that even with the new default limit, a system with 64 MiB
RAM cannot decompress files created with "xz -9" without user
overriding the limit. This should be OK, because if xz is going
to need more memory than the system has RAM, it will run very
very slowly and thus it's good that user has to override the limit
in that case.