xz/tests/files/README

424 lines
16 KiB
Plaintext

.xz and .lzma Test Files
------------------------
0. Introduction
This directory contains bunch of files to test handling of .xz,
.lzma (LZMA_Alone), and .lz (lzip) files in decoder implementations.
Many of the files have been created by hand with a hex editor, thus
there is no better "source code" than the files themselves. All the
test files and this README may be distributed under the terms of
the BSD Zero Clause License (0BSD).
1. File Types
Good files (good-*) must decode successfully without requiring
a lot of CPU time or RAM.
Unsupported files (unsupported-*) are good files, but headers
indicate features not supported by the current file format
specification.
Bad files (bad-*) must cause the decoder to give an error. Like
with the good files, these files must not require a lot of CPU
time or RAM before they get detected to be broken.
2. Descriptions of Individual .xz Files
2.1. Good Files
good-0-empty.xz has one Stream with no Blocks.
good-0pad-empty.xz has one Stream with no Blocks followed by
four-byte Stream Padding.
good-0cat-empty.xz has two zero-Block Streams concatenated without
Stream Padding.
good-0catpad-empty.xz has two zero-Block Streams concatenated with
four-byte Stream Padding between the Streams.
good-1-check-none.xz has one Stream with one Block with two
uncompressed LZMA2 chunks and no integrity check.
good-1-check-crc32.xz has one Stream with one Block with two
uncompressed LZMA2 chunks and CRC32 check.
good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
good-1-check-sha256.xz is like good-1-check-crc32.xz but with
SHA256.
good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
LZMA2 chunk in each Block.
good-1-block_header-1.xz has both Compressed Size and Uncompressed
Size in the Block Header. This has also four extra bytes of Header
Padding.
good-1-block_header-2.xz has known Compressed Size.
good-1-block_header-3.xz has known Uncompressed Size.
good-1-delta-lzma2.tiff.xz is an image file that compresses
better with Delta+LZMA2 than with plain LZMA2.
good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
uncompressed file is compress_prepared_bcj_x86 found from the tests
directory.
good-1-sparc-lzma2.xz uses the SPARC filter and LZMA2. The
uncompressed file is compress_prepared_bcj_sparc found from the tests
directory.
good-1-arm64-lzma2-1.xz uses the ARM64 filter and LZMA2. The
uncompressed data is constructed so that it tests integer
wrap around and sign extension. To recreate the file, compress
using XZ Utils 5.4.x (newer may or may not work too):
./debug/testfilegen-arm64 \
| xz -T1 -Ccrc32 --arm64 \
--lzma2=dict=64KiB,lp=2,lc=2 \
> good-1-arm64-lzma2-1.xz
good-1-arm64-lzma2-2.xz is like good-1-arm64-lzma2-1.xz but with
non-zero start offset. XZ Embedded doesn't support this file.
To recreate the file, compress using XZ Utils 5.4.x (newer may or
may not work too):
./debug/testfilegen-arm64 \
| xz -T1 -Ccrc32 --arm64=start=4294963200 \
--lzma2=dict=64KiB,lp=2,lc=2 \
> good-1-arm64-lzma2-2.xz
good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
new properties.
good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
the state without specifying new properties.
good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
uncompressed and the second is LZMA. The first chunk resets dictionary
and the second sets new properties.
good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
uncompressed with dictionary reset, and third is LZMA with new
properties but without dictionary reset.
good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
payload marker. XZ Utils 5.0.1 and older incorrectly see this file
as corrupt.
good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
return LZMA_BUF_ERROR in some cases. See commit message
d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
2.2. Unsupported Files
unsupported-check.xz uses Check ID 0x02 which isn't supported by
the current version of the file format. It is implementation-defined
how this file handled (it may reject it, or decode it possibly with
a warning).
unsupported-block_header.xz has a non-null byte in Header Padding,
which may indicate presence of a new unsupported field.
unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
unsupported-filter_flags-2.xz specifies only Delta filter in the
List of Filter Flags, but Delta isn't allowed as the last filter in
the chain. It could be a little more correct to detect this file as
corrupt instead of unsupported, but saying it is unsupported is
simpler in case of liblzma.
unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
List of Filter Flags. LZMA2 is allowed only as the last filter in the
chain. It could be a little more correct to detect this file as
corrupt instead of unsupported, but saying it is unsupported is
simpler in case of liblzma.
2.3. Bad Files
bad-0pad-empty.xz has one Stream with no Blocks followed by
five-byte Stream Padding. Stream Padding must be a multiple of four
bytes, thus this file is corrupt.
bad-0catpad-empty.xz has two zero-Block Streams concatenated with
five-byte Stream Padding between the Streams.
bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
LZMA_Alone file.
bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
wrong in the Header Magic Bytes field of the second Stream. liblzma
gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
the first Stream of a file has invalid Header Magic Bytes.)
bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
this.
bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
this.
bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
of the file.
bad-0-nonempty_index.xz has no Blocks but Index claims that there is
one Block.
bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
and Stream Footer.
bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
bad-1-vli-1.xz has two-byte variable-length integer in the
Uncompressed Size field in Block Header while one-byte would be enough
for that value. It's important that the file gets rejected due to too
big integer encoding instead of due to Uncompressed Size not matching
the value stored in the Block Header. That is, the decoder must not
try to decode the Compressed Data field.
bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
Size in Block Header. It's important that the file gets rejected due
to too big integer encoding instead of due to Uncompressed Size not
matching the value stored in the Block Header. That is, the decoder
must not try to decode the Compressed Data field.
bad-1-block_header-1.xz has Block Header that ends in the middle of
the Filter Flags field.
bad-1-block_header-2.xz has Block Header that has Compressed Size and
Uncompressed Size but no List of Filter Flags field.
bad-1-block_header-3.xz has wrong CRC32 in Block Header.
bad-1-block_header-4.xz has too big Compressed Size in Block Header
(2^63 - 1 bytes while maximum is a little less, because the whole
Block must stay smaller than 2^63). It's important that the file
gets rejected due to invalid Compressed Size value; the decoder
must not try decoding the Compressed Data field.
bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
bad-1-block_header-6.xz has corrupt Block Header which may crash
xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
c0297445064951807803457dca1611b3c47e7f0f.
bad-2-index-1.xz has wrong Unpadded Sizes in Index.
bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
bad-2-index-3.xz has non-null byte in Index Padding.
bad-2-index-4.xz wrong CRC32 in Index.
bad-2-index-5.xz has zero as Unpadded Size. It is important that the
file gets rejected specifically due to Unpadded Size having an invalid
value.
bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
fields have huge values whose sum exceeds the maximum allowed size
of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
check for the uncompressed size and thus doesn't catch the error
when decoding the Index field in this file. This makes "xz -l"
not detect the error and will display 0 as the uncompressed size.
Note that regular decompression isn't affected by this bug because
it uses lzma_index_hash_append() instead.
bad-2-compressed_data_padding.xz has non-null byte in the padding of
the Compressed Data field of the first Block.
bad-1-check-crc32.xz has wrong Check (CRC32).
bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
Block Header but wrong Check (CRC32) in the actual data. This file
differs by one byte from good-1-block_header-1.xz: the last byte of
the Check field is wrong. This file is useful for testing error
detection in the threaded decoder when a worker thread is configured
to pass input one byte at a time to the Block decoder.
bad-1-check-crc64.xz has wrong Check (CRC64).
bad-1-check-sha256.xz has wrong Check (SHA-256).
bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
doesn't reset the dictionary.
bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
indicates dictionary reset, but the LZMA compressed data tries to
repeat data from the previous chunk.
bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
the middle of Block.
bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
uncompressed and the second is LZMA. The first chunk resets dictionary
as it should, but the second chunk tries to reset state without
specifying properties for LZMA.
bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
anything in the header of the second chunk.
bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
bad-1-lzma2-7.xz has EOPM at LZMA level.
bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
properties in the third LZMA2 chunk.
bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
a LZMA2 chunk (no end marker). The uncompressed size of the partial
LZMA2 stream exceeds the value stored in the Block Header.
bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
LZMA2 decoder, extends past the end of Block (and even the end of
the file). Uncompressed Size in Block Header is bigger than the
invalid LZMA2 stream may produce (even if a decoder reads until
the end of the file). The Check type is None to nullify certain
simple size-based sanity checks in a Block decoder.
bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
payload marker. When Compressed Size bytes have been decoded,
Uncompressed Size bytes of output will have been produced but
the LZMA2 decoder doesn't indicate end of stream.
3. Descriptions of Individual .lzma Files
3.1. Good Files
good-unknown_size-with_eopm.lzma has unknown size in the header
and end of payload marker at the end.
good-known_size-without_eopm.lzma has a known size in the header
and no end of payload marker at the end.
good-known_size-with_eopm.lzma has a known size in the header
and end of payload marker at the end. XZ Utils 5.2.5 and older
will give an error at the end of the file after producing the
correct uncompressed output.
3.2. Bad Files
bad-unknown_size-without_eopm.lzma has unknown size in the header
but no end of payload marker at the end. This file might be seen
by a decoder as if it were truncated.
bad-too_big_size-with_eopm.lzma has too big uncompressed size in
the header and the end of payload marker will be detected before
the specified number of bytes have been decoded.
bad-too_small_size-without_eopm-1.lzma has too small uncompressed
size in the header. The decoder will look for end of payload marker
but instead find a literal that would produce more output.
bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
of a literal the problem occurs with a short repeated match.
bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
of a literal the problem occurs in the middle of a match.
4. Descriptions of Individual .lz (lzip) Files
4.1. Good Files
good-1-v0.lz contains a single version 0 member. lzip 1.17 and
*older* can decompress this; support for version 0 was removed
in lzip 1.18.
good-1-v0-trailing-1.lz is like good-1-v0.lz but contains
trailing data that the decompressor must ignore.
good-1-v1.lz contains a single version 1 member. lzip 1.3 and
newer can decompress this.
good-1-v1-trailing-1.lz is like good-1-v1.lz but contains
trailing data that the decompressor must ignore.
good-1-v1-trailing-2.lz is like good-1-v1.lz but contains
trailing data whose first three bytes match the .lz magic bytes.
With lzip >= 1.20 this file results in an error unless one uses
the command line option --loose-trailing. lzip 1.3 to 1.19 decode
this file successfully by default. XZ Utils uses the old behavior
because it allows lzma_code() to stop at the first byte of the
trailing data as long as the first byte isn't 0x4C (L in US-ASCII);
otherwise the first 1-3 bytes that equal to the magic bytes are
consumed and lost in lzma_code(), and this is visible in xz too:
$ ( xz -dc ; cat ) < good-1-v1-trailing-2.lz
Hello
World!
Trailing garbage
$ ( xz -dc --single-stream ; cat ) < good-1-v1-trailing-2.lz
Hello
World!
LZITrailing garbage
good-2-v0-v1.lz contains two members of which the first is
version 0 and the second version 1. lzip versions 1.3 to 1.17
(inclusive) can decompress this.
good-2-v1-v0.lz contains two members of which the first is
version 1 and the second version 0. lzip versions 1.3 to 1.17
(inclusive) can decompress this.
good-2-v1-v1.lz contains two version 1 members. lzip versions 1.3
and newer can decompress this.
4.2. Unsupported Files
unsupported-1-v234.lz is like good-1-v1.lz except the version
field has been set to 234 (0xEA) which, as of writing, isn't
defined or supported by any .lz implementation.
4.3. Bad Files
bad-1-v1-magic-1.lz is like good-1-v1.lz but the first magic byte
is wrong.
bad-1-v1-magic-2.lz is like good-1-v1.lz but the last (fourth)
magic byte is wrong.
bad-1-v1-dict-1.lz has too low value in the dictionary size field.
bad-1-v1-dict-2.lz has too high value in the dictionary size field.
bad-1-v1-crc32.lz has wrong CRC32 value.
bad-1-v0-uncomp-size.lz is version 0 format with incorrect value
in the uncompressed size field.
bad-1-v1-uncomp-size.lz is version 1 format with incorrect value
in the uncompressed size field.
bad-1-v1-member-size.lz has incorrect value in the member size
field.
bad-1-v1-trailing-magic.lz has the four .lz magic bytes as trailing
data. This should be detected as a truncated file and thus result
in an error. That is, the last four bytes of the file should not be
ignored as trailing garbage. lzip >= 1.18 matches this behavior
while older versions ignore the last four bytes and don't indicate
an error.