This is a bit rough but should be useful for basic things.
Ideas (with detailed examples) about the output format are
welcome.
The output of --robot --list is not necessarily stable yet,
although I don't currently have any plans about changing it.
The man page hasn't been updated yet.
to stdout even if --force is used.
--force will still enable compression of symlinks, but only
in case they point to a regular file.
The new way simply seems more reasonable. It matches gzip's
behavior while the old one matched bzip2's behavior.
This breaks API and ABI but most apps are not affected
since most apps don't use this part of the API. You will
get a compile error if you are using anything that got
broken.
Summary of changes:
- Ability to store Stream Flags, which are needed
for random-access reading in multi-Stream files.
- Separate function to set size of Stream Padding.
- Iterator structure makes it possible to read the same
lzma_index from multiple threads at the same time.
- A lot faster code to locate Blocks.
- Removed lzma_index_equal() without adding anything
to replace it. I don't know what it should do exactly
with the new features and what actually needs this
function in the first place other than test_index.c,
which now has its own code to compare lzma_indexes.
lzma_index_read() didn't skip over Stream Padding
if it was the first record in the Index.
lzma_index_cat() didn't combine small Indexes correctly.
The test suite was updated to check for these bugs.
These bugs didn't affect the xz command line tool or
most users of liblzma in any way.
The Index decoder code didn't perfectly match the API docs,
which said that *i will be set to point to the decoded Index
only after decoding has succeeded. The docs were a bit unclear
too.
Now the decoder will initially set *i to NULL. *i will be set
to point to the decoded Index once decoding has succeeded.
This simplifies applications too, since it avoids dangling
pointers.
a regular file.
Sparse file creation can be disabled with --no-sparse.
I don't promise yet that the name of this option won't
change before 5.0.0. It's possible that the code, that
checks when it is safe to use sparse output on stdout,
is not good enough, and a more flexible command line
option is needed to configure sparse file handling.
Currently --robot works only with --info-memory and
--version. --help and --long-help work too, but --robot
has no effect on them.
Thanks to Jonathan Nieder for the original patches.
I had hoped to keep liblzma as purely a compression
library as possible (e.g. file I/O will go into
a different library), but it seems that applications
linking agaisnt liblzma need some way to determine
the memory usage limit, and knowing the amount of RAM
is one reasonable way to help making such decisions.
Thanks to Jonathan Nieder for the original patch.
Originally the idea was that using LZMA_FULL_FLUSH
with Stream encoder would read the filter chain
from the same array that was used to intialize the
Stream encoder. Since most apps wouldn't use
LZMA_FULL_FLUSH, most apps wouldn't need to keep
the filter chain available after initializing the
Stream encoder. However, due to my mistake, it
actually required keeping the array always available.
Since setting the new filter chain via the array
used at initialization time is not a nice way to do
it for a couple of reasons, this commit ditches it
and introduces lzma_filters_update(). This new function
replaces also the "persistent" flag used by LZMA2
(and to-be-designed Subblock filter), which was also
an ugly thing to do.
Thanks to Alexey Tourbin for reminding me about the problem
that Stream encoder used to require keeping the filter
chain allocated.
This will be needed internally by liblzma once I fix
a design mistake in the encoder API. This function may
be useful to applications too so it's good to export it.
A minus sign is larger, easier to see in a printout, and more
likely to use the same glyph as ASCII hyphen-minus in a terminal
than a hyphen. Since broken manual pagers do not find hyphens
when the user searches for a hyphen-minus, minus signs are also
easier to search for. So use minus signs instead of hyphens to
render sample terminal output.
This replaces bswap.h and integer.h.
The tuklib module uses <byteswap.h> on GNU,
<sys/endian.h> on *BSDs and <sys/byteorder.h>
on Solaris, which may contain optimized code
like inline assembly.
Seems that it is a problem in some cases if the same
version of XZ Utils produces different output on different
endiannesses, so this commit fixes that problem. The output
will still vary between different XZ Utils versions, but I
cannot avoid that for now.
This commit bloatens the code on big endian systems by 1 KiB,
which should be OK since liblzma is bloated already. ;-)
Separate a few reusable components from XZ Utils specific
code. The reusable code is now in "tuklib" modules. A few
more could be separated still, e.g. bswap.h.
Fix some bugs in lzmainfo.
Fix physmem and cpucores code on OS/2. Thanks to Elbert Pol
for help.
Add OpenVMS support into physmem. Add a few #ifdefs to ease
building XZ Utils on OpenVMS. Thanks to Jouk Jansen for the
original patch.
This fixes "make install" on operating systems using
a suffix for executables.
Cygwin is treated specially. The symlink names won't have
.exe suffix even though the executables themselves have.
Thanks to Charles Wilson.
the function call succeeded.
NetBSD 4.0 returns positive values on success, but
NetBSD Current and FreeBSD return zero. OpenBSD's
man page doesn't tell what sysctl() returns on
success. All these BSDs return -1 on error.
Thanks to Robert Elz and Thomas Klausner.
and use a fix that works on all systems using
GNU assembler.
Maybe the assembler code is used e.g. on Solaris x86
but let's worry about it if this doesn't work on it.
Seems that in addition on Windows and DOS, also OpenBSD
lacks support for %'d style printf() format strings.
So far that is the only modern POSIX-like system I know
with this problem, but after this hack, the thousand
separator shouldn't be a problem on any system.
Maybe testing if a format string like %'d produces
reasonable output is invoking undefined behavior on some
systems, but so far all the problematic systems I've tried
just print the raw format string (e.g. %'d prints 'd).
Maybe Autoconf test would have been better, but this
hack works also for cross-compilation, and avoids
recompilation in case the system libc starts to support
the thousand separator.
Added lzma_nothrow for every function. It adds
throw() when the header is used in C++ code.
Some lzma_attrs were added or removed.
Lots of comments were improved.
lzmainfo now links against static liblzma. In contrast
to other command line tools in XZ Utils, linking lzmainfo
against static liblzma by default is dumb. This will be
fixed once I have fixed some related issues in configure.ac.
Attempts to compare two compressed files result in no output and
exit status 2.
Instead of going to standard output, ‘diff’ output is being
captured in the xz_status variable along with the exit status from
the decompression commands. Later, when this variable is examined
for nonzero status codes, numerals from dates in the ‘diff’ output
make it appear as though decompression failed.
So let the ‘diff’ output leak to standard output with another file
descriptor. (This trick is used in all similar contexts elsewhere
in xzdiff and in the analogous context in gzip’s zdiff script.)
It can be somewhat confusing that
less < some_file.txt
works fine, whereas
xzless < some_file.txt.xz
does not. Since version 429, ‘less’ allows a filter specified in
the LESSOPEN environment variable to preprocess its input even if
it comes from standard input, if $LESSOPEN begins with ‘|-’. So
set $LESSOPEN to take advantage of this feature.
Check less’s version at runtime so xzless can continue to work
with older versions.
This is a quick and slightly dirty fix to make the code
conform to the latest file format specification. Without
this patch, it's possible to make corrupt files by
specifying start offset that is not a multiple of the
filter's alignment. Custom start offset is almost never
used, so this was only a minor bug.
The xz command line tool doesn't validate the start offset,
so one will get a bit unclear error message if trying to use
an invalid start offset.
like "un", "cat", and "lz" when determining if
xz is run as unxz, xzcat, lzma, unlzma, or lzcat.
This is to ensure that if xz is renamed (e.g. via
--program-transform-name), it doesn't so easily
work in wrong mode.
It was ignored for compatibility with xz, but now that
--decompress --stdout --force copies unrecognized files
as is to stdout, simply ignoring --force in xzdec would
be wrong. xzdec will not support copying unrecognized
data as is to stdout, so it cannot support --force.
use AC_PROG_SED. We don't do anything fancy with sed,
so this should work OK. libtool 2.2 sets SED but 1.5
doesn't, so $(SED) happened to work when using libtool 2.2.
the latest versions found from gzip CVS repository.
configure will try to find a POSIX shell to be used by
the scripts. This should ease portability on systems
which have pre-POSIX /bin/sh.
xzgrep and xzdiff support .xz, .lzma, .gz, and .bz2 files.
xzmore and xzless support only .xz and .lzma files.
The name of the xz executable used in these scripts is
now correct even if --program-transform-name has been used.
files as is to standard output.
This feature is needed to be more compatible with gzip's
behavior. This was more complicated to implement than it
sounds, because the way liblzma is able to return errors with
files of only a few bytes in size. xz now has its own file
type detection code and no longer uses lzma_auto_decoder().
Don't use libtool convenience libraries to avoid recently
discovered long-standing subtle but somewhat severe bugs
in libtool (at least 1.5.22 and 2.2.6 are affected). It
was found when porting XZ Utils to Windows
<http://lists.gnu.org/archive/html/libtool/2009-06/msg00070.html>
but the problem is significant also e.g. on GNU/Linux.
Unless --disable-shared is passed to configure, static
library built from a set of convenience libraries will
contain PIC objects. That is, while libtool builds non-PIC
objects too, only PIC objects will be used from the
convenience libraries. On 32-bit x86 (tested on mobile XP2400+),
using PIC instead of non-PIC makes the decompressor 10 % slower
with the default CFLAGS.
So while xz was linked against static liblzma by default,
it got the slower PIC objects unless --disable-shared was
used. I tend develop and benchmark with --disable-shared
due to faster build time, so I hadn't noticed the problem
in benchmarks earlier.
This commit also adds support for building Windows resources
into liblzma and executables.
--format=lzma. This means that xz emulating lzma
doesn't decompress .xz files, while before this
commit it did. The new way is slightly simpler in
code and especially in upcoming documentation.
compressing and decompressing. This should be OK now that
xz automatically scales down the compression settings if
they would exceed the memory usage limit (earlier, the limit
for compression was increased to 90 % because low limit broke
scripts that used "xz -9" on systems with low RAM).
Support spcifying the memory usage limit as a percentage
of RAM (e.g. --memory=50%).
Support --threads=0 to reset the thread limit to the default
value (number of available CPU cores). Use UINT32_MAX instead
of SIZE_MAX as the maximum in args.c. hardware.c was already
expecting uint32_t value.
Cleaned up the output of --help and --long-help.
Don't round the memory usage limit in xzdec --help to avoid
an integer overflow and to not give wrong impression that
the limit is high enough when it may not actually be.
This adds lzdiff, lzgrep, and lzmore to the list of symlinks to install.
It also installs symlinks for the manual pages and removes the new
symlinks on uninstall.
liblzma tries to avoid useless free()/malloc() pairs in
initialization when multiple files are handled using the
same lzma_stream. This didn't work with filter chains
due to comparison of wrong pointers in lzma_next_coder_init(),
making liblzma think that no memory reallocation is needed
even when it actually is.
Easy way to trigger this bug is to decompress two files with
a single xz command. The first file should have e.g. x86+LZMA2
as the filter chain, and the second file just LZMA2.
- Don't use Windows-specific code on Windows. The old code
required at least Windows 2000. Now it should work on
Windows 98 and later, and maybe on Windows 95 too.
- Use less precision when showing estimated remaining time.
- Fix some small design issues.
the number of CPU cores. Added support for using sysinfo()
on Linux systems whose libc lacks appropriate sysconf()
support (at least dietlibc). The Autoconf macros were
split into separate files, and CPU core count detection
was moved from hardware.c to cpucores.h. The core count
isn't used for anything real for now, so a problematic
part in process.c was commented out.
Now configure.ac will get the version number directly from
src/liblzma/api/lzma/version.h. The intent is to reduce the
number of places where the version number is duplicated. In
future, support for displaying Git commit ID may be added too.
linked statically or dynamically against liblzma. The
default is still to use static liblzma, but it can now
be changed by passing --enable-dynamic to configure.
Thanks to Mike Frysinger for the original patch.
Fixed a few minor bugs in configure.ac.
- Use call/ret pair to get instruction pointer for PIC.
- Use PIC only if PIC or __PIC__ is #defined.
- The code should work on MinGW and Darwin in addition
to GNU/Linux and Solaris.
lzma_memlimit_encoder and lzma_memlimit_decoder to
lzma_raw_encoder_memlimit and lzma_raw_decoder_memlimit. :-(
Now it is fixed. Hopefully it doesn't cause too much trouble
to those who already thought API is stable.
Half of developers were already forgetting to use these
functions, which could have caused total breakage in some future
liblzma version or even now if --enable-small was used. Now
liblzma uses pthread_once() to do the initializations unless
it has been built with --disable-threads which make these
initializations thread-unsafe.
When --enable-small isn't used, liblzma currently gets needlessly
linked against libpthread (on systems that have it). While it is
stupid for now, liblzma will need threads in future anyway, so
this stupidity will be temporary only.
When --enable-small is used, different code CRC32 and CRC64 is
now used than without --enable-small. This made the resulting
binary slightly smaller, but the main reason was to clean it up
and to handle the lack of lzma_init_check().
The pkg-config file lzma.pc was renamed to liblzma.pc. I'm not
sure if it works correctly and portably for static linking
(Libs.private includes -pthread or other operating system
specific flags). Hopefully someone complains if it is bad.
lzma_rc_prices[] is now included as a precomputed array even
with --enable-small. It's just 128 bytes now that it uses uint8_t
instead of uint32_t. Smaller array seemed to be at least as fast
as the more bloated uint32_t array on x86; hopefully it's not bad
on other architectures.
that was related to LZMA_MODE_FAST. The original code is slightly
faster although it compresses slightly worse. But since it is fast
mode, it is better to select the faster version.
- Updated to the latest, probably final file format version.
- Command line tool reworked to not use threads anymore.
Threading will probably go into liblzma anyway.
- Memory usage limit is now about 30 % for uncompression
and about 90 % for compression.
- Progress indicator with --verbose
- Simplified --help and full --long-help
- Upgraded to the last LGPLv2.1+ getopt_long from gnulib.
- Some bug fixes
Use LZMA_PROG_ERROR instead of LZMA_HEADER_ERROR if the Filter ID
is in the reserved range. This allows Block Header encoder to
detect unallowed Filter IDs, which is good for Stream encoder.
code from block_private.h to block_decoder.c. Now the Block
encoder doesn't need compressed_size and uncompressed_size
from lzma_block structure to be initialized.
LZMA_Alone files. Decoding of concatenated LZMA_Alone files is
intentionally not supported, so it is better to put this in
auto decoder than LZMA_Alone decoder.
broken. API has changed a lot and it will still change a
little more here and there. The command line tool doesn't
have all the required changes to reflect the API changes, so
it's easy to get "internal error" or trigger assertions.