Minor updates to the file format specification.

This commit is contained in:
Lasse Collin 2008-09-03 17:06:25 +03:00
parent 9c75b089b4
commit bea301c26d
1 changed files with 85 additions and 20 deletions

View File

@ -43,10 +43,11 @@ The .lzma File Format
5.1. Alignment 5.1. Alignment
5.2. Security 5.2. Security
5.3. Filters 5.3. Filters
5.3.1. LZMA2 5.3.1. LZMA
5.3.2. Branch/Call/Jump Filters for Executables 5.3.2. LZMA2
5.3.3. Delta 5.3.3. Branch/Call/Jump Filters for Executables
5.3.3.1. Format of the Encoded Output 5.3.4. Delta
5.3.4.1. Format of the Encoded Output
5.4. Custom Filter IDs 5.4. Custom Filter IDs
5.4.1. Reserved Custom Filter ID Ranges 5.4.1. Reserved Custom Filter ID Ranges
6. Cyclic Redundancy Checks 6. Cyclic Redundancy Checks
@ -85,7 +86,7 @@ The .lzma File Format
0.2. Changes 0.2. Changes
Last modified: 2008-06-17 14:10+0300 Last modified: 2008-09-03 14:10+0300
(A changelog will be kept once the first official version (A changelog will be kept once the first official version
is made.) is made.)
@ -530,6 +531,10 @@ The .lzma File Format
officially defined Filter IDs and the formats of their Filter officially defined Filter IDs and the formats of their Filter
Properties are described in Section 5.3. Properties are described in Section 5.3.
Filter IDs greater than or equal to 0x4000_0000_0000_0000
(2^62) are reserved for implementation-specific internal use.
These Filter IDs must never be used in List of Filter Flags.
3.1.6. Header Padding 3.1.6. Header Padding
@ -765,20 +770,15 @@ The .lzma File Format
5.3. Filters 5.3. Filters
5.3.1. LZMA2 5.3.1. LZMA
LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
compression algorithm with high compression ratio and fast compression algorithm with high compression ratio and fast
decompression. LZMA is based on LZ77 and range coding decompression. LZMA is based on LZ77 and range coding
algorithms. algorithms.
LZMA2 uses LZMA internally, but adds support for uncompressed Filter ID: 0x40
chunks, eases stateful decoder implementations, and improves Size of Filter Properties: 5 bytes
support for multithreading. Thus, the plain LZMA will not be
supported in this file format.
Filter ID: 0x21
Size of Filter Properties: 1 byte
Changes size of data: Yes Changes size of data: Yes
Allow as a non-last filter: No Allow as a non-last filter: No
Allow as the last filter: Yes Allow as the last filter: Yes
@ -793,6 +793,66 @@ The .lzma File Format
a separate document, because including the documentation here a separate document, because including the documentation here
would lengthen this document considerably. would lengthen this document considerably.
The format of the Filter Properties field is as follows:
+-----------------+----+----+----+----+
| LZMA Properties | Dictionary Size |
+-----------------+----+----+----+----+
The LZMA Properties field contains three properties. An
abbreviation is given in parentheses, followed by the value
range of the property. The field consists of
1) the number of literal context bits (lc, [0, 4]);
2) the number of literal position bits (lp, [0, 4]); and
3) the number of position bits (pb, [0, 4]).
In addition to above ranges, the sum of lc and lp must not
exceed four. Note that this limit didn't exist in the old
LZMA_Alone format, which allowed lc to be in the range [0, 8].
The properties are encoded using the following formula:
LZMA Properties = (pb * 5 + lp) * 9 + lc
The following C code illustrates a straightforward way to
decode the properties:
uint8_t lc, lp, pb;
uint8_t prop = get_lzma_properties();
if (prop > (4 * 5 + 4) * 9 + 8)
return LZMA_PROPERTIES_ERROR;
pb = prop / (9 * 5);
prop -= pb * 9 * 5;
lp = prop / 9;
lc = prop - lp * 9;
if (lc + lp > 4)
return LZMA_PROPERTIES_ERROR;
Dictionary Size is encoded as unsigned 32-bit little endian
integer.
5.3.2. LZMA2
LZMA2 is an extensions on top of the original LZMA. LZMA2 uses
LZMA internally, but adds support for flushing the encoder,
uncompressed chunks, eases stateful decoder implementations,
and improves support for multithreading. For most uses, it is
recommended to use LZMA2 instead of LZMA.
Filter ID: 0x21
Size of Filter Properties: 1 byte
Changes size of data: Yes
Allow as a non-last filter: No
Allow as the last filter: Yes
Preferred alignment:
Input data: Adjustable to 1/2/4/8/16 byte(s)
Output data: 1 byte
The format of the one-byte Filter Properties field is as The format of the one-byte Filter Properties field is as
follows: follows:
@ -818,7 +878,7 @@ The .lzma File Format
37 3 29 1536 MiB 37 3 29 1536 MiB
38 2 30 2048 MiB 38 2 30 2048 MiB
39 3 30 3072 MiB 39 3 30 3072 MiB
40 2 31 4096 MiB 40 2 31 4096 MiB - 1 B
Instead of having a table in the decoder, the dictionary size Instead of having a table in the decoder, the dictionary size
can be decoded using the following C code: can be decoded using the following C code:
@ -827,11 +887,16 @@ The .lzma File Format
if (bits > 40) if (bits > 40)
return DICTIONARY_TOO_BIG; // Bigger than 4 GiB return DICTIONARY_TOO_BIG; // Bigger than 4 GiB
uint32_t dictionary_size = 2 | (bits & 1); uint32_t dictionary_size;
dictionary_size <<= bits / 2 + 11; if (bits == 40) {
dictionary_size = UINT32_MAX;
} else {
dictionary_size = 2 | (bits & 1);
dictionary_size <<= bits / 2 + 11;
}
5.3.2. Branch/Call/Jump Filters for Executables 5.3.3. Branch/Call/Jump Filters for Executables
These filters convert relative branch, call, and jump These filters convert relative branch, call, and jump
instructions to their absolute counterparts in executable instructions to their absolute counterparts in executable
@ -871,7 +936,7 @@ The .lzma File Format
the Subblock filter. the Subblock filter.
5.3.3. Delta 5.3.4. Delta
The Delta filter may increase compression ratio when the value The Delta filter may increase compression ratio when the value
of the next byte correlates with the value of an earlier byte of the next byte correlates with the value of an earlier byte
@ -892,7 +957,7 @@ The .lzma File Format
distance of 1 byte and 0xFF distance of 256 bytes. distance of 1 byte and 0xFF distance of 256 bytes.
5.3.3.1. Format of the Encoded Output 5.3.4.1. Format of the Encoded Output
The code below illustrates both encoding and decoding with The code below illustrates both encoding and decoding with
the Delta filter. the Delta filter.
@ -944,7 +1009,7 @@ The .lzma File Format
Bits Mask Description Bits Mask Description
0-15 0x0000_0000_0000_FFFF Filter ID 0-15 0x0000_0000_0000_FFFF Filter ID
16-55 0x00FF_FFFF_FFFF_0000 Developer ID 16-55 0x00FF_FFFF_FFFF_0000 Developer ID
56-62 0x7F00_0000_0000_0000 Static prefix: 0x7F 56-62 0x3F00_0000_0000_0000 Static prefix: 0x3F
The resulting 63-bit integer will use 9 bytes of space when The resulting 63-bit integer will use 9 bytes of space when
stored using the encoding described in Section 1.2. To get stored using the encoding described in Section 1.2. To get