mirror of https://git.tukaani.org/xz.git
141 lines
6.5 KiB
Plaintext
141 lines
6.5 KiB
Plaintext
|
|
||
|
LZMA Utils history
|
||
|
------------------
|
||
|
|
||
|
Tukaani distribution
|
||
|
|
||
|
In 2005, there was a small group working on Tukaani distribution, which
|
||
|
was a Slackware fork. One of the project goals was to fit the distro on
|
||
|
a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a
|
||
|
lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped
|
||
|
form into 700 MiB with LZMA. Naturally compression ratio varied across
|
||
|
packages, but this was what we got on average.
|
||
|
|
||
|
Slackware packages have traditionally had .tgz as the filename suffix,
|
||
|
which is an abbreviation of .tar.gz. A logical naming for LZMA
|
||
|
compressed packages was .tlz, being an abbreviation of .tar.lzma.
|
||
|
|
||
|
At the end of the year 2007, there's no distribution under the Tukaani
|
||
|
project anymore. Development of LZMA Utils still continues. Still,
|
||
|
there are .tlz packages around, because at least Vector Linux (a
|
||
|
Slackware based distribution) uses LZMA for its packages.
|
||
|
|
||
|
First versions of the modified pkgtools used the LZMA_Alone tool from
|
||
|
Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need
|
||
|
to interact with LZMA_Alone directly. But people soon wanted to use
|
||
|
LZMA for other files too, and the interface of LZMA_Alone wasn't
|
||
|
comfortable for those used to gzip and bzip2.
|
||
|
|
||
|
|
||
|
First steps of LZMA Utils
|
||
|
|
||
|
The first version of LZMA Utils (4.22.0) included a shell script called
|
||
|
lzmash. It was wrapper that had gzip-like command line interface. It
|
||
|
used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep,
|
||
|
zdiff, and related scripts from gzip were adapted work with LZMA and
|
||
|
were part of the first LZMA Utils release too.
|
||
|
|
||
|
LZMA Utils 4.22.0 included also lzmadec, which was a small (less than
|
||
|
10 KiB) decoder-only command line tool. It was written on top of the
|
||
|
decoder-only C code found from the LZMA SDK. lzmadec was convenient in
|
||
|
situations where LZMA_Alone (a few hundred KiB) would be too big.
|
||
|
|
||
|
lzmash and lzmadec were written by Lasse Collin.
|
||
|
|
||
|
|
||
|
Second generation
|
||
|
|
||
|
The lzmash script was an ugly and not very secure hack. The last
|
||
|
version of LZMA Utils to use lzmash was 4.27.1.
|
||
|
|
||
|
LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written
|
||
|
by Ville Koskinen. It was written in C++, and used the encoder and
|
||
|
decoder from C++ LZMA SDK with little modifications. This tool replaced
|
||
|
both the lzmash script and the LZMA_Alone command line tool in LZMA
|
||
|
Utils.
|
||
|
|
||
|
Introducing this new tool caused some temporary incompatibilities,
|
||
|
because LZMA_Alone executable was simply named lzma like the new
|
||
|
command line tool, but they had completely different command line
|
||
|
interface. The file format was still the same.
|
||
|
|
||
|
Lasse wrote liblzmadec, which was a small decoder-only library based on
|
||
|
the C code found from LZMA SDK. liblzmadec had API similar to zlib,
|
||
|
although there were some significant differences, which made it
|
||
|
non-trivial to use it in some applications designed for zlib and
|
||
|
libbzip2.
|
||
|
|
||
|
The lzmadec command line tool was converted to use liblzmadec.
|
||
|
|
||
|
Alexandre Sauvé helped converting build system to use GNU Autotools.
|
||
|
This made is easier to test for certain less portable features needed
|
||
|
by the new command line tool.
|
||
|
|
||
|
Since the new command line tool never got completely finished (for
|
||
|
example, it didn't support LZMA_OPT environment variable), the intent
|
||
|
was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished,
|
||
|
but appeared to work well enough, so some people started using it too.
|
||
|
|
||
|
Because the development of the third generation of LZMA Utils was
|
||
|
delayed considerably (roughly two years), the 4.32.x branch had to be
|
||
|
kept maintained. It got some bug fixes now and then, and finally it was
|
||
|
decided to call it stable, although most of the missing features were
|
||
|
never added.
|
||
|
|
||
|
|
||
|
File format problems
|
||
|
|
||
|
The file format used by LZMA_Alone was primitive. It was designed for
|
||
|
embedded systems in mind, and thus provided only minimal set of
|
||
|
features. The two biggest problems for non-embedded use were lack of
|
||
|
magic bytes and integrity check.
|
||
|
|
||
|
Igor and Lasse started developing a new file format with some help from
|
||
|
Ville Koskinen, Mark Adler and Mikko Pouru. Designing the new format
|
||
|
took quite a long time. It was mostly because Lasse was quite slow at
|
||
|
getting things done due to personal reasons.
|
||
|
|
||
|
Near the end of the year 2007 the new format was practically finished.
|
||
|
Compared to LZMA_Alone format and the .gz format used by gzip, the new
|
||
|
.lzma format is quite complex as a whole. This means that tools having
|
||
|
*full* support for the new format would be larger and more complex than
|
||
|
the tools supporting only the old LZMA_Alone format.
|
||
|
|
||
|
For the situations where the full support for the .lzma format wouldn't
|
||
|
be required (embedded systems, operating system kernels), the new
|
||
|
format has a well-defined subset, which is easy to support with small
|
||
|
amount of code. It wouldn't be as small as an implementation using the
|
||
|
LZMA_Alone format, but the difference shouldn't be significant.
|
||
|
|
||
|
The new .lzma format allows dividing the data in multiple independent
|
||
|
blocks, which can be compressed and uncompressed independenly. This
|
||
|
makes multi-threading possible with algorithms that aren't inherently
|
||
|
parallel (such as LZMA). There's also a central index of the sizes of
|
||
|
the blocks, which makes it possible to do limited random-access reading
|
||
|
with granularity of the block size.
|
||
|
|
||
|
The new .lzma format uses the same filename suffix that was used for
|
||
|
LZMA_Alone files. The advantage is that users using the new tools won't
|
||
|
notice the change to the new format. The disadvantage is that the old
|
||
|
tools won't work with the new files.
|
||
|
|
||
|
|
||
|
Third generation
|
||
|
|
||
|
LZMA Utils 4.42.0alphas drop the rest of the C++ LZMA SDK. The LZMA and
|
||
|
other included filters (algorithm implementations) are still directly
|
||
|
based on LZMA SDK, but ported to C.
|
||
|
|
||
|
liblzma is now the core of LZMA Utils. It has zlib-like API, which
|
||
|
doesn't suffer from the problems of the API of liblzmadec. liblzma
|
||
|
supports not only LZMA, but several other filters, which together
|
||
|
can improve compression ratio even further with certain file types.
|
||
|
|
||
|
The lzma and lzmadec command line tools have been rewritten. They uses
|
||
|
liblzma to do the actual compressing or uncompressing.
|
||
|
|
||
|
The development of LZMA Utils 4.42.x is still in alpha stage. Several
|
||
|
features are still missing or don't fully work yet. Documentation is
|
||
|
also very minimal.
|
||
|
|