xz/doc/liblzma-intro.txt


Introduction to liblzma
-----------------------

Writing applications to work with liblzma

    liblzma API is split in several subheaders to improve readability and
    maintainance. The subheaders must not be #included directly. lzma.h
    requires that certain integer types and macros are available when
    the header is #included. On systems that have inttypes.h that conforms
    to C99, the following will work:

        #include <sys/types.h>
        #include <inttypes.h>
        #include <lzma.h>

    Those who have used zlib should find liblzma's API easy to use.
    To developers who haven't used zlib before, I recommend learning
    zlib first, because zlib has excellent documentation.

    While the API is similar to that of zlib, there are some major
    differences, which are summarized below.

    For basic stream encoding, zlib has three functions (deflateInit(),
    deflate(), and deflateEnd()). Similarly, there are three functions
    for stream decoding (inflateInit(), inflate(), and inflateEnd()).
    liblzma has only single coding and ending function. Thus, to
    encode one may use, for example, lzma_stream_encoder_single(),
    lzma_code(), and lzma_end(). Simlarly for decoding, one may
    use lzma_auto_decoder(), lzma_code(), and lzma_end().

    zlib has deflateReset() and inflateReset() to reset the stream
    structure without reallocating all the memory. In liblzma, all
    coder initialization functions are like zlib's reset functions:
    the first-time initializations are done with the same functions
    as the reinitializations (resetting).

    To make all this work, liblzma needs to know when lzma_stream
    doesn't already point to an allocated and initialized coder.
    This is achieved by initializing lzma_stream structure with
    LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
    (for exampple when new lzma_stream has been allocated with malloc()).
    This initialization should be done exactly once per lzma_stream
    structure to avoid leaking memory. Calling lzma_end() will leave
    lzma_stream into a state comparable to the state achieved with
    LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.

    Example probably clarifies a lot. With zlib, compression goes
    roughly like this:

        z_stream strm;
        deflateInit(&strm, level);
        deflate(&strm, Z_RUN);
        deflate(&strm, Z_RUN);
        ...
        deflate(&strm, Z_FINISH);
        deflateEnd(&strm) or deflateReset(&strm)

    With liblzma, it's slightly different:

        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_stream_encoder_single(&strm, &options);
        lzma_code(&strm, LZMA_RUN);
        lzma_code(&strm, LZMA_RUN);
        ...
        lzma_code(&strm, LZMA_FINISH);
        lzma_end(&strm) or reinitialize for new coding work

     Reinitialization in the last step can be any function that can
     initialize lzma_stream; it doesn't need to be the same function
     that was used for the previous initialization. If it is the same
     function, liblzma will usually be able to re-use most of the
     existing memory allocations (depends on how much the initialization
     options change). If you reinitialize with different function,
     liblzma will automatically free the memory of the previous coder.


File formats

    liblzma supports multiple container formats for the compressed data.
    Different initialization functions initialize the lzma_stream to
    process different container formats. See the details from the public
    header files.

    The following functions are the most commonly used:

      - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
        the recommended format for most purporses.

      - lzma_alone_encoder(): Useful if you need to encode into the
        legacy LZMA_Alone format.

      - lzma_auto_decoder(): Decoder that automatically detects the
        file format; recommended when you decode compressed files on
        disk, because this way compatibility with the legacy LZMA_Alone
        format is transparent.

      - lzma_stream_decoder(): Decoder for Single- and Multi-Block
        Streams; this is good if you want to accept only .lzma Streams.


Filters

    liblzma supports multiple filters (algorithm implementations). The new
    .lzma format supports filter-chain having up to seven filters. In the
    filter chain, the output of one filter is input of the next filter in
    the chain. The legacy LZMA_Alone format supports only one filter, and
    that must always be LZMA.

        General-purporse compression:

            LZMA        The main algorithm of liblzma (surprise!)

        Branch/Call/Jump filters for executables:

            x86         This filter is known as BCJ in 7-Zip
            IA64        IA-64 (Itanium)
            PowerPC     Big endian PowerPC
            ARM
            ARM-Thumb
            SPARC

        Other filters:

            Copy        Dummy filter that simply copies all the data
                        from input to output.

            Subblock    Multi-purporse filter, that can
                          - embed End of Payload Marker if the previous
                            filter in the chain doesn't support it; and
                          - apply Subfilters, which filter only part
                            of the same compressed Block in the Stream.

    Branch/Call/Jump filters never change the size of the data. They
    should usually be used as a pre-filter for some compression filter
    like LZMA.


Integrity checks

    The .lzma Stream format uses CRC32 as the integrity check for
    different file format headers. It is possible to omit CRC32 from
    the Block Headers, but not from Stream Header. This is the reason
    why CRC32 code cannot be disabled when building liblzma (in addition,
    the LZMA encoder uses CRC32 for hashing, so that's another reason).

    The integrity check of the actual data is calculated from the
    uncompressed data. This check can be CRC32, CRC64, or SHA256.
    It can also be omitted completely, although that usually is not
    a good thing to do. There are free IDs left, so support for new
    checks algorithms can be added later.


API and ABI stability

    The API and ABI of liblzma isn't stable yet, although no huge
    changes should happen. One potential place for change is the
    lzma_options_subblock structure.

    In the 4.42.0alpha phase, the shared library version number won't
    be updated even if ABI breaks. I don't want to track the ABI changes
    yet. Just rebuild everything when you upgrade liblzma until we get
    to the beta stage.


Size of the library

    While liblzma isn't huge, it is quite far from the smallest possible
    LZMA implementation: full liblzma binary (with support for all
    filters and other features) is way over 100 KiB, but the plain raw
    LZMA decoder is only 5-10 KiB.

    To decrease the size of the library, you can omit parts of the library
    by passing certain options to the `configure' script. Disabling
    everything but the decoders of the require filters will usually give
    you a small enough library, but if you need a decoder for example
    embedded in the operating system kernel, the code from liblzma probably
    isn't suitable as is.

    If you need a minimal implementation supporting .lzma Streams, you
    may need to do partial rewrite. liblzma uses stateful API like zlib.
    That increases the size of the library. Using callback API or even
    simpler buffer-to-buffer API would allow smaller implementation.

    LZMA SDK contains smaller LZMA decoder written in ANSI-C than
    liblzma, so you may want to take a look at that code. However,
    it doesn't (at least not yet) support the new .lzma Stream format.


Documentation

    There's no other documentation than the public headers and this
    text yet. Real docs will be written some day, I hope.
Imported to git. 2007-12-08 22:42:33 +00:00
			`Introduction to liblzma`
			`-----------------------`

			`Writing applications to work with liblzma`

			`liblzma API is split in several subheaders to improve readability and`
Introduced compatibility with systems that have pre-C99 or no inttypes.h. This is useful when the compiler has good enough support for C99, but libc headers don't. Changed liblzma API so that sys/types.h and inttypes.h have to be #included before #including lzma.h. On systems that don't have C99 inttypes.h, it's the problem of the applications to provide the required types and macros before #including lzma.h. If lzma.h defined the missing types and macros, it could conflict with third-party applications whose configure has detected that the types are missing and defined them in config.h already. An alternative would have been introducing lzma_uint32 and similar types, but that would just be an extra pain on modern systems. 2008-01-06 14:27:41 +00:00			`maintainance. The subheaders must not be #included directly. lzma.h`
			`requires that certain integer types and macros are available when`
			`the header is #included. On systems that have inttypes.h that conforms`
			`to C99, the following will work:`

			`#include <sys/types.h>`
			`#include <inttypes.h>`
			`#include <lzma.h>`
Imported to git. 2007-12-08 22:42:33 +00:00
			`Those who have used zlib should find liblzma's API easy to use.`
			`To developers who haven't used zlib before, I recommend learning`
			`zlib first, because zlib has excellent documentation.`

			`While the API is similar to that of zlib, there are some major`
			`differences, which are summarized below.`

			`For basic stream encoding, zlib has three functions (deflateInit(),`
			`deflate(), and deflateEnd()). Similarly, there are three functions`
			`for stream decoding (inflateInit(), inflate(), and inflateEnd()).`
			`liblzma has only single coding and ending function. Thus, to`
			`encode one may use, for example, lzma_stream_encoder_single(),`
			`lzma_code(), and lzma_end(). Simlarly for decoding, one may`
			`use lzma_auto_decoder(), lzma_code(), and lzma_end().`

			`zlib has deflateReset() and inflateReset() to reset the stream`
			`structure without reallocating all the memory. In liblzma, all`
			`coder initialization functions are like zlib's reset functions:`
			`the first-time initializations are done with the same functions`
			`as the reinitializations (resetting).`

			`To make all this work, liblzma needs to know when lzma_stream`
			`doesn't already point to an allocated and initialized coder.`
			`This is achieved by initializing lzma_stream structure with`
			`LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR`
			`(for exampple when new lzma_stream has been allocated with malloc()).`
			`This initialization should be done exactly once per lzma_stream`
			`structure to avoid leaking memory. Calling lzma_end() will leave`
			`lzma_stream into a state comparable to the state achieved with`
			`LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.`

			`Example probably clarifies a lot. With zlib, compression goes`
			`roughly like this:`

			`z_stream strm;`
			`deflateInit(&strm, level);`
			`deflate(&strm, Z_RUN);`
			`deflate(&strm, Z_RUN);`
			`...`
			`deflate(&strm, Z_FINISH);`
			`deflateEnd(&strm) or deflateReset(&strm)`

			`With liblzma, it's slightly different:`

			`lzma_stream strm = LZMA_STREAM_INIT;`
			`lzma_stream_encoder_single(&strm, &options);`
			`lzma_code(&strm, LZMA_RUN);`
			`lzma_code(&strm, LZMA_RUN);`
			`...`
			`lzma_code(&strm, LZMA_FINISH);`
			`lzma_end(&strm) or reinitialize for new coding work`

			`Reinitialization in the last step can be any function that can`
			`initialize lzma_stream; it doesn't need to be the same function`
			`that was used for the previous initialization. If it is the same`
			`function, liblzma will usually be able to re-use most of the`
			`existing memory allocations (depends on how much the initialization`
			`options change). If you reinitialize with different function,`
			`liblzma will automatically free the memory of the previous coder.`


			`File formats`

			`liblzma supports multiple container formats for the compressed data.`
			`Different initialization functions initialize the lzma_stream to`
			`process different container formats. See the details from the public`
			`header files.`

			`The following functions are the most commonly used:`

			`- lzma_stream_encoder_single(): Encodes Single-Block Stream; this`
			`the recommended format for most purporses.`

			`- lzma_alone_encoder(): Useful if you need to encode into the`
			`legacy LZMA_Alone format.`

			`- lzma_auto_decoder(): Decoder that automatically detects the`
			`file format; recommended when you decode compressed files on`
			`disk, because this way compatibility with the legacy LZMA_Alone`
			`format is transparent.`

			`- lzma_stream_decoder(): Decoder for Single- and Multi-Block`
			`Streams; this is good if you want to accept only .lzma Streams.`


			`Filters`

			`liblzma supports multiple filters (algorithm implementations). The new`
			`.lzma format supports filter-chain having up to seven filters. In the`
			`filter chain, the output of one filter is input of the next filter in`
			`the chain. The legacy LZMA_Alone format supports only one filter, and`
			`that must always be LZMA.`

			`General-purporse compression:`

			`LZMA The main algorithm of liblzma (surprise!)`

			`Branch/Call/Jump filters for executables:`

			`x86 This filter is known as BCJ in 7-Zip`
			`IA64 IA-64 (Itanium)`
			`PowerPC Big endian PowerPC`
			`ARM`
			`ARM-Thumb`
			`SPARC`

			`Other filters:`

			`Copy Dummy filter that simply copies all the data`
			`from input to output.`

			`Subblock Multi-purporse filter, that can`
			`- embed End of Payload Marker if the previous`
			`filter in the chain doesn't support it; and`
			`- apply Subfilters, which filter only part`
			`of the same compressed Block in the Stream.`

			`Branch/Call/Jump filters never change the size of the data. They`
			`should usually be used as a pre-filter for some compression filter`
			`like LZMA.`


			`Integrity checks`

			`The .lzma Stream format uses CRC32 as the integrity check for`
			`different file format headers. It is possible to omit CRC32 from`
			`the Block Headers, but not from Stream Header. This is the reason`
			`why CRC32 code cannot be disabled when building liblzma (in addition,`
			`the LZMA encoder uses CRC32 for hashing, so that's another reason).`

			`The integrity check of the actual data is calculated from the`
			`uncompressed data. This check can be CRC32, CRC64, or SHA256.`
			`It can also be omitted completely, although that usually is not`
			`a good thing to do. There are free IDs left, so support for new`
			`checks algorithms can be added later.`


			`API and ABI stability`

			`The API and ABI of liblzma isn't stable yet, although no huge`
			`changes should happen. One potential place for change is the`
			`lzma_options_subblock structure.`

			`In the 4.42.0alpha phase, the shared library version number won't`
			`be updated even if ABI breaks. I don't want to track the ABI changes`
			`yet. Just rebuild everything when you upgrade liblzma until we get`
			`to the beta stage.`


			`Size of the library`

			`While liblzma isn't huge, it is quite far from the smallest possible`
			`LZMA implementation: full liblzma binary (with support for all`
			`filters and other features) is way over 100 KiB, but the plain raw`
			`LZMA decoder is only 5-10 KiB.`

			`To decrease the size of the library, you can omit parts of the library`
			by passing certain options to the `configure' script. Disabling
			`everything but the decoders of the require filters will usually give`
			`you a small enough library, but if you need a decoder for example`
			`embedded in the operating system kernel, the code from liblzma probably`
			`isn't suitable as is.`

			`If you need a minimal implementation supporting .lzma Streams, you`
			`may need to do partial rewrite. liblzma uses stateful API like zlib.`
			`That increases the size of the library. Using callback API or even`
			`simpler buffer-to-buffer API would allow smaller implementation.`

			`LZMA SDK contains smaller LZMA decoder written in ANSI-C than`
			`liblzma, so you may want to take a look at that code. However,`
			`it doesn't (at least not yet) support the new .lzma Stream format.`


			`Documentation`

			`There's no other documentation than the public headers and this`
			`text yet. Real docs will be written some day, I hope.`