1
0
mirror of https://git.tukaani.org/xz.git synced 2025-10-26 19:12:59 +00:00

Update the FAQ.

This commit is contained in:
Lasse Collin 2010-10-02 12:07:33 +03:00
parent 61ae593661
commit f9722dbeca

View File

@ -6,7 +6,7 @@ Q: What do the letters XZ mean?
A: Nothing. They are just two letters, which come from the file format
suffix .xz. The .xz suffix was selected, because it seemed to be
pretty much unused. It is no deeper meaning.
pretty much unused. It has no deeper meaning.
Q: What are LZMA and LZMA2?
@ -33,7 +33,18 @@ A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
LZMA Utils.
There are several other projects using LZMA. Most are more or less
based on LZMA SDK.
based on LZMA SDK. See <http://7-zip.org/links.html>.
Q: Why is liblzma named liblzma if its primary file format is .xz?
Shouldn't it be e.g. libxz?
A: When the designing of the .xz format began, the idea was to replace
the .lzma format and use the same .lzma suffix. It would have been
quite OK to reuse the suffix when there were very few .lzma files
around. However, the old .lzma format become popular before the
new format was finished. The new format was renamed to .xz but the
name of liblzma wasn't changed.
Q: Do XZ Utils support the .7z format?
@ -96,7 +107,7 @@ A: The .xz format is documented in xz-file-format.txt. It is a container
Documenting LZMA and LZMA2 is planned, but for now, there is no other
documentation that the source code. Before you begin, you should know
the basics of LZ77 and range coding algorithms. LZMA is based on LZ77,
but LZMA is *a lot* more complex. Range coding is used to compress
but LZMA is a lot more complex. Range coding is used to compress
the final bitstream like Huffman coding is used in Deflate.
@ -104,6 +115,90 @@ Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,
because it requires using more than one encoded output stream.
A streamable version of BCJ2-style filtering is planned.
Q: I need to use a script that runs "xz -9". On a system with 256 MiB
of RAM, xz says that it cannot allocate memory. Can I make the
script work without modifying it?
A: Set a default memory usage limit for compression. You can do it e.g.
in a shell initialization script such as ~/.bashrc or /etc/profile:
XZ_DEFAULTS=--memlimit-compress=150MiB
export XZ_DEFAULTS
xz will then scale the compression settings down so that the given
memory usage limit is not reached. This way xz shouldn't run out
of memory.
Check also that memory-related resource limits are high enough.
On most systems, "ulimit -a" will show the current resource limits.
Q: How do I create files that can be decompressed with XZ Embedded?
A: See the documentation in XZ Embedded. In short, something like
this is a good start:
xz --check=crc32 --lzma2=preset=6e,dict=64KiB
Or if a BCJ filter is needed too, e.g. if compressing
a kernel image for PowerPC:
xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB
Adjust dictionary size to get a good compromise between
compression ratio and decompressor memory usage. Note that
in single-call decompression mode of XZ Embedded, a big
dictionary doesn't increase memory usage.
Q: Will xz support threaded compression?
A: It is planned and has been taken into account when designing
the .xz file format. Eventually there will probably be three types
of threading, each method having its own advantages and disadvantages.
The simplest method is splitting the uncompressed data into blocks
and compressing them in parallel independent from each other.
Since the blocks are compressed independently, they can also be
decompressed independently. Together with the index feature in .xz,
this allows using threads to create .xz files for random-access
reading. This also makes threaded decompression possible, although
it is not clear if threaded decompression will ever be implemented.
The independent blocks method has a couple of disadvantages too. It
will compress worse than a single-block method. Often the difference
is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
the memory usage of the compressor increases linearly when adding
threads.
Match finder parallelization is another threading method. It has
been in 7-Zip for ages. It doesn't affect compression ratio or
memory usage significantly. Among the three threading methods, only
this is useful when compressing small files (files that are not
significantly bigger than the dictionary). Unfortunately this method
scales only to about two CPU cores.
The third method is pigz-style threading (I use that name, because
pigz <http://www.zlib.net/pigz/> uses that method). It doesn't
affect compression ratio significantly and scales to many cores.
The memory usage scales linearly when threads are added. It isn't
significant with pigz, because Deflate uses only 32 KiB dictionary,
but with LZMA2 the memory usage will increase dramatically just like
with the independent blocks method. There is also a constant
computational overhead, which may make pigz-method a bit dull on
dual-core compared to the parallel match finder method, but with more
cores the overhead is not a big deal anymore.
Combining the threading methods will be possible and also useful.
E.g. combining match finder parallelization with pigz-style threading
can cut the memory usage by 50 %.
It is possible that the single-threaded method will be modified to
create files indentical to the pigz-style method. We'll see once
pigz-style threading has been implemented in liblzma.
Q: How do I build a program that needs liblzmadec (lzmadec.h)?
@ -124,5 +219,6 @@ A: Give --enable-small to the configure script. Use also appropriate
If the result is still too big, take a look at XZ Embedded. It is
a separate project, which provides a limited but significantly
smaller XZ decoder implementation than XZ Utils.
smaller XZ decoder implementation than XZ Utils. You can find it
at <http://tukaani.org/xz/embedded.html>.