mirror of https://git.tukaani.org/xz.git
Minor updates to the file format specification.
This commit is contained in:
parent
9c75b089b4
commit
bea301c26d
|
@ -43,10 +43,11 @@ The .lzma File Format
|
||||||
5.1. Alignment
|
5.1. Alignment
|
||||||
5.2. Security
|
5.2. Security
|
||||||
5.3. Filters
|
5.3. Filters
|
||||||
5.3.1. LZMA2
|
5.3.1. LZMA
|
||||||
5.3.2. Branch/Call/Jump Filters for Executables
|
5.3.2. LZMA2
|
||||||
5.3.3. Delta
|
5.3.3. Branch/Call/Jump Filters for Executables
|
||||||
5.3.3.1. Format of the Encoded Output
|
5.3.4. Delta
|
||||||
|
5.3.4.1. Format of the Encoded Output
|
||||||
5.4. Custom Filter IDs
|
5.4. Custom Filter IDs
|
||||||
5.4.1. Reserved Custom Filter ID Ranges
|
5.4.1. Reserved Custom Filter ID Ranges
|
||||||
6. Cyclic Redundancy Checks
|
6. Cyclic Redundancy Checks
|
||||||
|
@ -85,7 +86,7 @@ The .lzma File Format
|
||||||
|
|
||||||
0.2. Changes
|
0.2. Changes
|
||||||
|
|
||||||
Last modified: 2008-06-17 14:10+0300
|
Last modified: 2008-09-03 14:10+0300
|
||||||
|
|
||||||
(A changelog will be kept once the first official version
|
(A changelog will be kept once the first official version
|
||||||
is made.)
|
is made.)
|
||||||
|
@ -530,6 +531,10 @@ The .lzma File Format
|
||||||
officially defined Filter IDs and the formats of their Filter
|
officially defined Filter IDs and the formats of their Filter
|
||||||
Properties are described in Section 5.3.
|
Properties are described in Section 5.3.
|
||||||
|
|
||||||
|
Filter IDs greater than or equal to 0x4000_0000_0000_0000
|
||||||
|
(2^62) are reserved for implementation-specific internal use.
|
||||||
|
These Filter IDs must never be used in List of Filter Flags.
|
||||||
|
|
||||||
|
|
||||||
3.1.6. Header Padding
|
3.1.6. Header Padding
|
||||||
|
|
||||||
|
@ -765,20 +770,15 @@ The .lzma File Format
|
||||||
|
|
||||||
5.3. Filters
|
5.3. Filters
|
||||||
|
|
||||||
5.3.1. LZMA2
|
5.3.1. LZMA
|
||||||
|
|
||||||
LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
|
LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
|
||||||
compression algorithm with high compression ratio and fast
|
compression algorithm with high compression ratio and fast
|
||||||
decompression. LZMA is based on LZ77 and range coding
|
decompression. LZMA is based on LZ77 and range coding
|
||||||
algorithms.
|
algorithms.
|
||||||
|
|
||||||
LZMA2 uses LZMA internally, but adds support for uncompressed
|
Filter ID: 0x40
|
||||||
chunks, eases stateful decoder implementations, and improves
|
Size of Filter Properties: 5 bytes
|
||||||
support for multithreading. Thus, the plain LZMA will not be
|
|
||||||
supported in this file format.
|
|
||||||
|
|
||||||
Filter ID: 0x21
|
|
||||||
Size of Filter Properties: 1 byte
|
|
||||||
Changes size of data: Yes
|
Changes size of data: Yes
|
||||||
Allow as a non-last filter: No
|
Allow as a non-last filter: No
|
||||||
Allow as the last filter: Yes
|
Allow as the last filter: Yes
|
||||||
|
@ -793,6 +793,66 @@ The .lzma File Format
|
||||||
a separate document, because including the documentation here
|
a separate document, because including the documentation here
|
||||||
would lengthen this document considerably.
|
would lengthen this document considerably.
|
||||||
|
|
||||||
|
The format of the Filter Properties field is as follows:
|
||||||
|
|
||||||
|
+-----------------+----+----+----+----+
|
||||||
|
| LZMA Properties | Dictionary Size |
|
||||||
|
+-----------------+----+----+----+----+
|
||||||
|
|
||||||
|
The LZMA Properties field contains three properties. An
|
||||||
|
abbreviation is given in parentheses, followed by the value
|
||||||
|
range of the property. The field consists of
|
||||||
|
|
||||||
|
1) the number of literal context bits (lc, [0, 4]);
|
||||||
|
2) the number of literal position bits (lp, [0, 4]); and
|
||||||
|
3) the number of position bits (pb, [0, 4]).
|
||||||
|
|
||||||
|
In addition to above ranges, the sum of lc and lp must not
|
||||||
|
exceed four. Note that this limit didn't exist in the old
|
||||||
|
LZMA_Alone format, which allowed lc to be in the range [0, 8].
|
||||||
|
|
||||||
|
The properties are encoded using the following formula:
|
||||||
|
|
||||||
|
LZMA Properties = (pb * 5 + lp) * 9 + lc
|
||||||
|
|
||||||
|
The following C code illustrates a straightforward way to
|
||||||
|
decode the properties:
|
||||||
|
|
||||||
|
uint8_t lc, lp, pb;
|
||||||
|
uint8_t prop = get_lzma_properties();
|
||||||
|
if (prop > (4 * 5 + 4) * 9 + 8)
|
||||||
|
return LZMA_PROPERTIES_ERROR;
|
||||||
|
|
||||||
|
pb = prop / (9 * 5);
|
||||||
|
prop -= pb * 9 * 5;
|
||||||
|
lp = prop / 9;
|
||||||
|
lc = prop - lp * 9;
|
||||||
|
|
||||||
|
if (lc + lp > 4)
|
||||||
|
return LZMA_PROPERTIES_ERROR;
|
||||||
|
|
||||||
|
Dictionary Size is encoded as unsigned 32-bit little endian
|
||||||
|
integer.
|
||||||
|
|
||||||
|
|
||||||
|
5.3.2. LZMA2
|
||||||
|
|
||||||
|
LZMA2 is an extensions on top of the original LZMA. LZMA2 uses
|
||||||
|
LZMA internally, but adds support for flushing the encoder,
|
||||||
|
uncompressed chunks, eases stateful decoder implementations,
|
||||||
|
and improves support for multithreading. For most uses, it is
|
||||||
|
recommended to use LZMA2 instead of LZMA.
|
||||||
|
|
||||||
|
Filter ID: 0x21
|
||||||
|
Size of Filter Properties: 1 byte
|
||||||
|
Changes size of data: Yes
|
||||||
|
Allow as a non-last filter: No
|
||||||
|
Allow as the last filter: Yes
|
||||||
|
|
||||||
|
Preferred alignment:
|
||||||
|
Input data: Adjustable to 1/2/4/8/16 byte(s)
|
||||||
|
Output data: 1 byte
|
||||||
|
|
||||||
The format of the one-byte Filter Properties field is as
|
The format of the one-byte Filter Properties field is as
|
||||||
follows:
|
follows:
|
||||||
|
|
||||||
|
@ -818,7 +878,7 @@ The .lzma File Format
|
||||||
37 3 29 1536 MiB
|
37 3 29 1536 MiB
|
||||||
38 2 30 2048 MiB
|
38 2 30 2048 MiB
|
||||||
39 3 30 3072 MiB
|
39 3 30 3072 MiB
|
||||||
40 2 31 4096 MiB
|
40 2 31 4096 MiB - 1 B
|
||||||
|
|
||||||
Instead of having a table in the decoder, the dictionary size
|
Instead of having a table in the decoder, the dictionary size
|
||||||
can be decoded using the following C code:
|
can be decoded using the following C code:
|
||||||
|
@ -827,11 +887,16 @@ The .lzma File Format
|
||||||
if (bits > 40)
|
if (bits > 40)
|
||||||
return DICTIONARY_TOO_BIG; // Bigger than 4 GiB
|
return DICTIONARY_TOO_BIG; // Bigger than 4 GiB
|
||||||
|
|
||||||
uint32_t dictionary_size = 2 | (bits & 1);
|
uint32_t dictionary_size;
|
||||||
|
if (bits == 40) {
|
||||||
|
dictionary_size = UINT32_MAX;
|
||||||
|
} else {
|
||||||
|
dictionary_size = 2 | (bits & 1);
|
||||||
dictionary_size <<= bits / 2 + 11;
|
dictionary_size <<= bits / 2 + 11;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
5.3.2. Branch/Call/Jump Filters for Executables
|
5.3.3. Branch/Call/Jump Filters for Executables
|
||||||
|
|
||||||
These filters convert relative branch, call, and jump
|
These filters convert relative branch, call, and jump
|
||||||
instructions to their absolute counterparts in executable
|
instructions to their absolute counterparts in executable
|
||||||
|
@ -871,7 +936,7 @@ The .lzma File Format
|
||||||
the Subblock filter.
|
the Subblock filter.
|
||||||
|
|
||||||
|
|
||||||
5.3.3. Delta
|
5.3.4. Delta
|
||||||
|
|
||||||
The Delta filter may increase compression ratio when the value
|
The Delta filter may increase compression ratio when the value
|
||||||
of the next byte correlates with the value of an earlier byte
|
of the next byte correlates with the value of an earlier byte
|
||||||
|
@ -892,7 +957,7 @@ The .lzma File Format
|
||||||
distance of 1 byte and 0xFF distance of 256 bytes.
|
distance of 1 byte and 0xFF distance of 256 bytes.
|
||||||
|
|
||||||
|
|
||||||
5.3.3.1. Format of the Encoded Output
|
5.3.4.1. Format of the Encoded Output
|
||||||
|
|
||||||
The code below illustrates both encoding and decoding with
|
The code below illustrates both encoding and decoding with
|
||||||
the Delta filter.
|
the Delta filter.
|
||||||
|
@ -944,7 +1009,7 @@ The .lzma File Format
|
||||||
Bits Mask Description
|
Bits Mask Description
|
||||||
0-15 0x0000_0000_0000_FFFF Filter ID
|
0-15 0x0000_0000_0000_FFFF Filter ID
|
||||||
16-55 0x00FF_FFFF_FFFF_0000 Developer ID
|
16-55 0x00FF_FFFF_FFFF_0000 Developer ID
|
||||||
56-62 0x7F00_0000_0000_0000 Static prefix: 0x7F
|
56-62 0x3F00_0000_0000_0000 Static prefix: 0x3F
|
||||||
|
|
||||||
The resulting 63-bit integer will use 9 bytes of space when
|
The resulting 63-bit integer will use 9 bytes of space when
|
||||||
stored using the encoding described in Section 1.2. To get
|
stored using the encoding described in Section 1.2. To get
|
||||||
|
|
Loading…
Reference in New Issue