There was somewhat illogical behavior when --extreme was
specified and mixed with custom filter chains.
Before this commit, "xz -9 --lzma2 -e" was equivalent
to "xz --lzma2". After it is equivalent to "xz -6e"
(all earlier preset options get forgotten when a custom
filter chain is specified and the default preset is 6
to which -e is applied). I find this less illogical.
This also affects the meaning of "xz -9e --lzma2 -7".
Earlier it was equivalent to "xz -7e" (the -e specified
before a custom filter chain wasn't forgotten). Now it
is "xz -7". Note that "xz -7e" still is the same as "xz -e7".
Hopefully very few cared about this in the first place,
so pretty much no one should even notice this change.
Thanks to Conley Moorhous.
The options are now ordered in the same order as in xz's help
message.
Descriptions were added to the options that are ignored.
I left them in parenthesis even if it looks a bit weird
because I find it easier to spot the ignored vs. non-ignored
options from the list that way.
To avoid false positives when detecting .lzma files,
rare values in dictionary size and uncompressed size fields
were rejected. They will still be rejected if .lzma files
are decoded with lzma_auto_decoder(), but when using
lzma_alone_decoder() directly, such files will now be accepted.
Hopefully this is an OK compromise.
This doesn't affect xz because xz still has its own file
format detection code. This does affect lzmadec though.
So after this commit lzmadec will accept files that xz or
xz-emulating-lzma doesn't.
NOTE: lzma_alone_decoder() still won't decode all .lzma files
because liblzma's LZMA decoder doesn't support lc + lp > 4.
Reported here:
http://sourceforge.net/projects/lzmautils/forums/forum/708858/topic/7068827
This race condition could cause a deadlock if lzma_end() was
called before finishing the encoding. This can happen with
xz with debugging enabled (non-debugging version doesn't
call lzma_end() before exiting).
Use "read" instead of "awk" in xzless to get the version
number of "less". The need for awk was introduced in
the commit db5c1817fa.
Thanks to Ariel P for the patch.
This adds lzma_get_progress() to liblzma and takes advantage
of it in xz.
lzma_get_progress() collects progress information from
the thread-specific structures so that fairly accurate
progress information is available to applications. Adding
a new function seemed to be a better way than making the
information directly available in lzma_stream (like total_in
and total_out are) because collecting the information requires
locking mutexes. It's waste of time to do it more often than
the up to date information is actually needed by an application.
In v4.999.9beta~30 (xzless: Support compressed standard input,
2009-08-09), xzless learned to parse ‘less -V’ output to figure out
whether less is new enough to handle $LESSOPEN settings starting
with “|-”. That worked well for a while, but the version string from
‘less’ versions 448 (June, 2012) is misparsed, producing a warning:
$ xzless /tmp/test.xz; echo $?
/usr/bin/xzless: line 49: test: 456 (GNU regular expressions): \
integer expression expected
0
More precisely, modern ‘less’ lists the regexp implementation along
with its version number, and xzless passes the entire version number
with attached parenthetical phrase as a number to "test $a -gt $b",
producing the above confusing message.
$ less-444 -V | head -1
less 444
$ less -V | head -1
less 456 (no regular expressions)
So relax the pattern matched --- instead of expecting "less <number>",
look for a line of the form "less <number>[ (extra parenthetical)]".
While at it, improve the behavior when no matching line is found ---
instead of producing a cryptic message, we can fall back on a LESSPIPE
setting that is supported by all versions of ‘less’.
The implementation uses "awk" for simplicity. Hopefully that’s
portable enough.
Reported-by: Jörg-Volker Peetz <jvpeetz@web.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
There is a tiny risk of causing breakage: If an application
assigns lzma_stream.allocator to a non-const pointer, such
code won't compile anymore. I don't know why anyone would do
such a thing though, so in practice this shouldn't cause trouble.
Thanks to Jan Kratochvil for the patch.
lzma_code() could incorrectly return LZMA_BUF_ERROR if
all of the following was true:
- The caller knows how many bytes of output to expect
and only provides that much output space.
- When the last output bytes are decoded, the
caller-provided input buffer ends right before
the LZMA2 end of payload marker. So LZMA2 won't
provide more output anymore, but it won't know it
yet and thus won't return LZMA_STREAM_END yet.
- A BCJ filter is in use and it hasn't left any
unfiltered bytes in the temp buffer. This can happen
with any BCJ filter, but in practice it's more likely
with filters other than the x86 BCJ.
Another situation where the bug can be triggered happens
if the uncompressed size is zero bytes and no output space
is provided. In this case the decompression can fail even
if the whole input file is given to lzma_code().
A similar bug was fixed in XZ Embedded on 2011-09-19.
When grepping binary files, grep may exit before it has
read all the input. In this case, gzip -q returns 2 (eating
SIGPIPE), but xz and bzip2 show SIGPIPE as the exit status
(e.g. 141). This causes wrong exit status when grepping
xz- or bzip2-compressed binary files.
The fix checks for the special exit status that indicates SIGPIPE.
It uses kill -l which should be supported everywhere since it
is in both SUSv2 (1997) and POSIX.1-2008.
Thanks to James Buren for the bug report.
xzdiff was clobbering the exit status from diff in a case
statement used to analyze the exit statuses from "xz" when
its operands were two compressed files. Save and restore
diff's exit status to fix this.
The bug is inherited from zdiff in GNU gzip and was fixed
there on 2009-10-09.
Thanks to Jonathan Nieder for the patch and
to Peter Pallinger for reporting the bug.
Symbol versioning is enabled by default on GNU/Linux,
other GNU-based systems, and FreeBSD.
I'm not sure how stable this is, so it may need
backward-incompatible changes before the next release.
The idea is that alpha and beta symbols are considered
unstable and require recompiling the applications that
use those symbols. Once a symbol is stable, it may get
extended with new features in ways that don't break
compatibility with older ABI & API.
The mydist target runs validate_map.sh which should
catch some probable problems in liblzma.map. Otherwise
I would forget to update the map file for new releases.
If the operating system libc or other base libraries
provide SHA-256, use that instead of our own copy.
Note that this doesn't use OpenSSL or libgcrypt or
such libraries to avoid creating dependencies to
other packages.
This supports at least FreeBSD, NetBSD, OpenBSD, Solaris,
MINIX, and Darwin. They all provide similar but not
identical SHA-256 APIs; everyone is a little different.
Thanks to Wim Lewis for the original patch, improvements,
and testing.
Now the following works as you would expect:
echo foo | xz > foo.xz
echo bar | xz >> foo.xz
( xz -dc --single-stream ; xz -dc --single-stream ) < foo.xz
Note that it doesn't work if the input is not seekable
or if there is Stream Padding between the concatenated
.xz Streams.
Use gettimeofday() if clock_gettime() isn't available
(e.g. Darwin).
The test for availability of pthread_condattr_setclock()
and CLOCK_MONOTONIC was incorrect. Instead of fixing the
#ifdefs, use an Autoconf test. That way if there exists a
system that supports them but doesn't specify the matching
POSIX #defines, the features will still get detected.
Don't try to use pthread_sigmask() on OpenVMS. It doesn't
have that function.
Guard mythread.h against being #included multiple times.
Reported-by: Diego Elio Pettenò <flameeyes@gentoo.org>
Signed-off-by: Martin Väth <vaeth@mathematik.uni-wuerzburg.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Spot candidates by running these commands:
git ls-files |xargs perl -0777 -n \
-e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims)' \
-e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g; print "$ARGV:$n:$v\n"}'
Thanks to Jim Meyering for the original patch.
This is the simplest method to do threading, which splits
the uncompressed data into blocks and compresses them
independently from each other. There's room for improvement
especially to reduce the memory usage, but nevertheless,
this is a good start.
Empty Block was created if the input buffer was empty.
Empty Block wastes a few bytes of space, but more importantly
it triggers a bug in XZ Utils 5.0.1 and older when trying
to decompress such a file. 5.0.1 and older consider such
files to be corrupt. I thought that no encoder creates empty
Blocks when releasing 5.0.2 but I was wrong.
The biggest problem was that the integrity check type
wasn't validated, and e.g. lzma_easy_buffer_encode()
would create a corrupt .xz Stream if given an unsupported
Check ID. Luckily applications don't usually try to use
an unsupport Check ID, so this bug is unlikely to cause
many real-world problems.
This adds:
- mythread_sync() macro to create synchronized blocks
- mythread_cond structure and related functions
and macros for condition variables with timed
waiting using a relative timeout
- mythread_create() to create a thread with all
signals blocked
Some of these wouldn't need to be inline functions,
but I'll keep them this way for now for simplicity.
For timed waiting on a condition variable, librt is
now required on some systems to use clock_gettime().
configure.ac was updated to handle this.
This is incompatible with the 8.3 support patch made by
Juan Manuel Guerrero. I think this one is nicer, but
I need to get feedback from DOS users before saying
that this is the final version of 8.3 filename support.
Try to avoid overwriting the source file if --force is
used and the generated destination filename refers to
the source file. This can happen with 8.3 filenames where
extra characters are ignored.
If the generated output file refers to a special file
like "con" or "prn", refuse to write to it even if --force
is used.
It leaks old filter options structures (hundred bytes or so)
every time the lzma_stream is reinitialized. With the xz tool,
this happens when compressing multiple files.
The decoder considered empty LZMA2 streams to be corrupt.
This shouldn't matter much with .xz files, because no encoder
creates empty LZMA2 streams in .xz. This bug is more likely
to cause problems in applications that use raw LZMA2 streams.
It didn't work at all. It tried to use the -q option
for grep, but it appended it after "--". This works
around it by redirecting to /dev/null. The downside
is that this can be slower with big files compared
to proper use of "grep -q".
Thanks to Gregory Margo.
xz didn't compress setuid/setgid/sticky files and files
with multiple hard links even with --force. This bug was
introduced in 23ac2c44c3.
Thanks to Charles Wilson.
This has no semantic changes. I find the new names slightly
more logical and they match the names that are already used
in XZ Embedded.
The name fastpos wasn't changed (not worth the hassle).
If any of the reserved members in lzma_stream are non-zero
or non-NULL, LZMA_OPTIONS_ERROR is returned. It is possible
that a new feature in the future is indicated by just setting
a reserved member to some other value, so the old liblzma
version need to catch it as an unsupported feature.
The non-standard ones from msvcrt.dll appear to work
most of the time with XZ Utils, but there are some
corner cases where things may go very wrong. So it's
good to use the better replacements provided by
MinGW(-w64) runtime.
Calling raise() to kill xz when user has pressed C-c
is a bit verbose on OS/2 and DOS/DJGPP. Instead of
calling raise(), set only the exit status to 1.
Those are the same thing, and the former makes it a bit
easier to build the code with other build systems, because
one doesn't need to update the version number into custom
config.h.
This change affects only lzmainfo. Other tools were already
using LZMA_VERSION_STRING.
Most distros want xz linked against shared liblzma, so
it doesn't help much to require --enable-dynamic for that.
Those who want to avoid PIC on x86-32 to get better
performance, can still do it e.g. by using --disable-shared
to compile xz and then another pass to compile shared liblzma.
Part of these static/dynamic tricks were needed for Windows
in the past. Nowadays we rely on GCC and binutils to do the
right thing with auto-import. If the Autotooled build system
needs to support some other toolchain on Windows in the future,
this may need some rethinking.
Lots of content was updated on the xz man page.
Technical improvements:
- Start a new sentence on a new line.
- Use fairly short lines.
- Use constant-width font for examples (where supported).
- Some minor cleanups.
Thanks to Jonathan Nieder for some language fixes.
The code assumed that printing numbers with thousand separators
and decimal points would always produce only US-ASCII characters.
This was used for buffer sizes (with snprintf(), no overflows)
and aligning columns of the progress indicator and --list. That
assumption was wrong (e.g. LC_ALL=fi_FI.UTF-8 with glibc), so
multibyte character support was added in this commit. The old
way is used if the operating system doesn't have enough multibyte
support (e.g. lacks wcwidth()).
The sizes of buffers were increased to accomodate multibyte
characters. I don't know how big they should be exactly, but
they aren't used for anything critical, so it's not too bad.
If they still aren't big enough, I hopefully get a bug report.
snprintf() takes care of avoiding buffer overflows.
Some static buffers were replaced with buffers allocated on
stack. double_to_str() was removed. uint64_to_str() and
uint64_to_nicestr() now share the static buffer and test
for thousand separator support.
Integrity check names "None" and "Unknown-N" (2 <= N <= 15)
were marked to be translated. I had forgot these, plus they
wouldn't have worked correctly anyway before this commit,
because printing tables with multibyte strings didn't work.
Thanks to Marek Černocký for reporting the bug about
misaligned table columns in --list output.
This should reduce the cases where --extreme makes
compression worse. On the other hand, some other
files may now benefit slightly less from --extreme.
It was 8 + nice_len / 4, now it is 4 + nice_len / 4.
This allows faster settings at lower nice_len values,
even though it seems that I won't use automatic depth
calcuation with HC3 and HC4 in the presets.
For several people, the limiter causes bigger problems that
it solves, so it is better to have it disabled by default.
Those who want to have a limiter by default need to enable
it via the environment variable XZ_DEFAULTS.
Support for environment variable XZ_DEFAULTS was added. It is
parsed before XZ_OPT and technically identical with it. The
intended uses differ quite a bit though; see the man page.
The memory usage limit can now be set separately for
compression and decompression using --memlimit-compress and
--memlimit-decompress. To set both at once, -M or --memlimit
can be used. --memory was retained as a legacy alias for
--memlimit for backwards compatibility.
The semantics of --info-memory were changed in backwards
incompatible way. Compatibility wasn't meaningful due to
changes in the memory usage limiter functionality.
The memory usage limiter info is no longer shown at the
bottom of xz --long -help.
The memory usage limiter support for removed completely from xzdec.
xz's man page was updated to match the above changes. Various
unrelated fixes were also made to the man page.
When using -O2 with GCC, it liked to swap two comparisons
in one "if" statement. It's otherwise fine except that
the latter part, which is seemingly never executed, got
executed (nothing wrong with that) and then triggered
warning in Valgrind about conditional jump depending on
uninitialized variable. A few people find this annoying
so do things a bit differently to avoid the warning.
message_filters_to_str() converts the filter chain to
a string. message_filters_show() replaces the original
message_filters().
uint32_to_optstr() was also added to show the dictionary
size in nicer format when possible.
Don't use #error to generate compile error, because some
compilers actually don't take it as an error. This fixes
tuklib_physmem on IRIX.
Fix incorrect error check for sysconf() return values.
Add AIX, HP-UX, and Tru64 specific code to detect the
amount RAM.
Add HP-UX specific code to detect the number of CPU cores.
Thanks a lot to Peter O'Gorman for initial patches,
testing, and debugging these fixes.
The extra space for showing both has been taken from the
sizes field. If the sizes grow big, bigger units than MiB
will be used. It makes it slightly difficult to see that
progress is still happening with huge files, but it should
be OK in practice.
Thanks to Trent W. Buck for <http://bugs.debian.org/574583>
and Jonathan Nieder for suggestions how to fix it.
Originally both base-2 and base-10 were supported, but since
there seems to be little need for base-10 in XZ Utils, treat
everything as base-2 and also be more relaxed about the case
of the first letter of the suffix. Now xz will accept e.g.
KiB, Ki, k, K, kB, and KB, and interpret them all as 1024. The
recommended spelling of the suffixes are still KiB, MiB, and GiB.
It still feels a bit wrong to round 1 byte to 1 MiB but
at least it is now done consistently so that the same
byte value is always rounded the same way to MiB.
Previously the default limit was always 40 % of RAM. The
new limit is a little bit more complex:
- If 40 % of RAM is at least 80 MiB, 40 % of RAM is used
as the limit.
- If 80 % of RAM is over 80 MiB, 80 MiB is used as the limit.
- Otherwise 80 % of RAM is used as the limit.
This should make it possible to decompress files created with
"xz -9" on more systems. Swapping is generally more expected
on systems with less RAM, so higher default limit on them
shouldn't cause too bad surprises in terms of heavy swapping.
Instead, the higher default limit should reduce the number of
bad surprises when it used to prevent decompression of files
created with "xz -9". The DoS prevention system shouldn't be
a DoS itself.
Note that even with the new default limit, a system with 64 MiB
RAM cannot decompress files created with "xz -9" without user
overriding the limit. This should be OK, because if xz is going
to need more memory than the system has RAM, it will run very
very slowly and thus it's good that user has to override the limit
in that case.
With bad luck, lzma_code() could return LZMA_BUF_ERROR
when it shouldn't.
This has been here since the early days of liblzma.
It got triggered by the modifications made to the xz
tool in commit 18c10c30d2
but only when decompressing .lzma files. Somehow I managed
to miss testing that with Valgrind earlier.
This fixes <http://bugs.gentoo.org/show_bug.cgi?id=305591>.
Thanks to Rafał Mużyło for helping to debug it on IRC.
lzma_block.version has to be initialized even for
lzma_block_header_decode(). This way a future version
of liblzma won't allocate memory in a way that an old
application doesn't know how to free it.
The subtlety of this change is that all current apps
using lzma_block_header_decode() will keep working for
now, because the only possible version value is zero,
and lzma_block_header_decode() unconditionally sets the
version to zero even now. Unless fixed, these apps will
break in the future if a new version of the Block options
is ever needed.
If signal handlers haven't been established, then it's
useless to try to block them, especially since the sigset_t
used for blocking hasn't been initialized yet.
The opening of the destination file is now delayed a little.
The coder is initialized, and if decompressing, the memory
usage of the first Block compared against the memory
usage limit before the destination file is opened. This
means that if --force was used, the old "target" file won't
be deleted so easily when something goes wrong very early.
Thanks to Mark K for the bug report.
The above fix required some changes to progress message
handling. Now there is a separate function for setting and
printing the filename. It is used also in list.c.
list_file() now handles stdin correctly (gives an error).
A useless check for user_abort was removed from file_io.c.
This is a bit rough but should be useful for basic things.
Ideas (with detailed examples) about the output format are
welcome.
The output of --robot --list is not necessarily stable yet,
although I don't currently have any plans about changing it.
The man page hasn't been updated yet.
to stdout even if --force is used.
--force will still enable compression of symlinks, but only
in case they point to a regular file.
The new way simply seems more reasonable. It matches gzip's
behavior while the old one matched bzip2's behavior.
This breaks API and ABI but most apps are not affected
since most apps don't use this part of the API. You will
get a compile error if you are using anything that got
broken.
Summary of changes:
- Ability to store Stream Flags, which are needed
for random-access reading in multi-Stream files.
- Separate function to set size of Stream Padding.
- Iterator structure makes it possible to read the same
lzma_index from multiple threads at the same time.
- A lot faster code to locate Blocks.
- Removed lzma_index_equal() without adding anything
to replace it. I don't know what it should do exactly
with the new features and what actually needs this
function in the first place other than test_index.c,
which now has its own code to compare lzma_indexes.
lzma_index_read() didn't skip over Stream Padding
if it was the first record in the Index.
lzma_index_cat() didn't combine small Indexes correctly.
The test suite was updated to check for these bugs.
These bugs didn't affect the xz command line tool or
most users of liblzma in any way.
The Index decoder code didn't perfectly match the API docs,
which said that *i will be set to point to the decoded Index
only after decoding has succeeded. The docs were a bit unclear
too.
Now the decoder will initially set *i to NULL. *i will be set
to point to the decoded Index once decoding has succeeded.
This simplifies applications too, since it avoids dangling
pointers.
a regular file.
Sparse file creation can be disabled with --no-sparse.
I don't promise yet that the name of this option won't
change before 5.0.0. It's possible that the code, that
checks when it is safe to use sparse output on stdout,
is not good enough, and a more flexible command line
option is needed to configure sparse file handling.
Currently --robot works only with --info-memory and
--version. --help and --long-help work too, but --robot
has no effect on them.
Thanks to Jonathan Nieder for the original patches.
I had hoped to keep liblzma as purely a compression
library as possible (e.g. file I/O will go into
a different library), but it seems that applications
linking agaisnt liblzma need some way to determine
the memory usage limit, and knowing the amount of RAM
is one reasonable way to help making such decisions.
Thanks to Jonathan Nieder for the original patch.
Originally the idea was that using LZMA_FULL_FLUSH
with Stream encoder would read the filter chain
from the same array that was used to intialize the
Stream encoder. Since most apps wouldn't use
LZMA_FULL_FLUSH, most apps wouldn't need to keep
the filter chain available after initializing the
Stream encoder. However, due to my mistake, it
actually required keeping the array always available.
Since setting the new filter chain via the array
used at initialization time is not a nice way to do
it for a couple of reasons, this commit ditches it
and introduces lzma_filters_update(). This new function
replaces also the "persistent" flag used by LZMA2
(and to-be-designed Subblock filter), which was also
an ugly thing to do.
Thanks to Alexey Tourbin for reminding me about the problem
that Stream encoder used to require keeping the filter
chain allocated.
This will be needed internally by liblzma once I fix
a design mistake in the encoder API. This function may
be useful to applications too so it's good to export it.
A minus sign is larger, easier to see in a printout, and more
likely to use the same glyph as ASCII hyphen-minus in a terminal
than a hyphen. Since broken manual pagers do not find hyphens
when the user searches for a hyphen-minus, minus signs are also
easier to search for. So use minus signs instead of hyphens to
render sample terminal output.
This replaces bswap.h and integer.h.
The tuklib module uses <byteswap.h> on GNU,
<sys/endian.h> on *BSDs and <sys/byteorder.h>
on Solaris, which may contain optimized code
like inline assembly.
Seems that it is a problem in some cases if the same
version of XZ Utils produces different output on different
endiannesses, so this commit fixes that problem. The output
will still vary between different XZ Utils versions, but I
cannot avoid that for now.
This commit bloatens the code on big endian systems by 1 KiB,
which should be OK since liblzma is bloated already. ;-)
Separate a few reusable components from XZ Utils specific
code. The reusable code is now in "tuklib" modules. A few
more could be separated still, e.g. bswap.h.
Fix some bugs in lzmainfo.
Fix physmem and cpucores code on OS/2. Thanks to Elbert Pol
for help.
Add OpenVMS support into physmem. Add a few #ifdefs to ease
building XZ Utils on OpenVMS. Thanks to Jouk Jansen for the
original patch.
This fixes "make install" on operating systems using
a suffix for executables.
Cygwin is treated specially. The symlink names won't have
.exe suffix even though the executables themselves have.
Thanks to Charles Wilson.