mirror of https://git.tukaani.org/xz.git
306 lines
12 KiB
Plaintext
306 lines
12 KiB
Plaintext
|
|
.xz and .lzma Test Files
|
|
------------------------
|
|
|
|
0. Introduction
|
|
|
|
This directory contains bunch of files to test handling of .xz
|
|
and .lzma files in decoder implementations. Many of the files have
|
|
been created by hand with a hex editor, thus there is no better
|
|
"source code" than the files themselves. All the test files and
|
|
this README have been put into the public domain.
|
|
|
|
|
|
1. File Types
|
|
|
|
Good files (good-*.xz, good-*.lzma) must decode successfully
|
|
without requiring a lot of CPU time or RAM.
|
|
|
|
Unsupported files (unsupported-*.xz) are good files, but headers
|
|
indicate features not supported by the current file format
|
|
specification.
|
|
|
|
Bad files (bad-*.xz, bad-*.lzma) must cause the decoder to give
|
|
an error. Like with the good files, these files must not require
|
|
a lot of CPU time or RAM before they get detected to be broken.
|
|
|
|
|
|
2. Descriptions of Individual .xz Files
|
|
|
|
2.1. Good Files
|
|
|
|
good-0-empty.xz has one Stream with no Blocks.
|
|
|
|
good-0pad-empty.xz has one Stream with no Blocks followed by
|
|
four-byte Stream Padding.
|
|
|
|
good-0cat-empty.xz has two zero-Block Streams concatenated without
|
|
Stream Padding.
|
|
|
|
good-0catpad-empty.xz has two zero-Block Streams concatenated with
|
|
four-byte Stream Padding between the Streams.
|
|
|
|
good-1-check-none.xz has one Stream with one Block with two
|
|
uncompressed LZMA2 chunks and no integrity check.
|
|
|
|
good-1-check-crc32.xz has one Stream with one Block with two
|
|
uncompressed LZMA2 chunks and CRC32 check.
|
|
|
|
good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
|
|
|
|
good-1-check-sha256.xz is like good-1-check-crc32.xz but with
|
|
SHA256.
|
|
|
|
good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
|
|
LZMA2 chunk in each Block.
|
|
|
|
good-1-block_header-1.xz has both Compressed Size and Uncompressed
|
|
Size in the Block Header. This has also four extra bytes of Header
|
|
Padding.
|
|
|
|
good-1-block_header-2.xz has known Compressed Size.
|
|
|
|
good-1-block_header-3.xz has known Uncompressed Size.
|
|
|
|
good-1-delta-lzma2.tiff.xz is an image file that compresses
|
|
better with Delta+LZMA2 than with plain LZMA2.
|
|
|
|
good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
|
|
uncompressed file is compress_prepared_bcj_x86 found from the tests
|
|
directory.
|
|
|
|
good-1-sparc-lzma2.xz uses the SPARC filter and LZMA. The
|
|
uncompressed file is compress_prepared_bcj_sparc found from the tests
|
|
directory.
|
|
|
|
good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
|
|
new properties.
|
|
|
|
good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
|
|
the state without specifying new properties.
|
|
|
|
good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
|
|
uncompressed and the second is LZMA. The first chunk resets dictionary
|
|
and the second sets new properties.
|
|
|
|
good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
|
|
uncompressed with dictionary reset, and third is LZMA with new
|
|
properties but without dictionary reset.
|
|
|
|
good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
|
|
payload marker. XZ Utils 5.0.1 and older incorrectly see this file
|
|
as corrupt.
|
|
|
|
good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
|
|
|
|
good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
|
|
and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
|
|
return LZMA_BUF_ERROR in some cases. See commit message
|
|
d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
|
|
|
|
|
|
2.2. Unsupported Files
|
|
|
|
unsupported-check.xz uses Check ID 0x02 which isn't supported by
|
|
the current version of the file format. It is implementation-defined
|
|
how this file handled (it may reject it, or decode it possibly with
|
|
a warning).
|
|
|
|
unsupported-block_header.xz has a non-null byte in Header Padding,
|
|
which may indicate presence of a new unsupported field.
|
|
|
|
unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
|
|
|
|
unsupported-filter_flags-2.xz specifies only Delta filter in the
|
|
List of Filter Flags, but Delta isn't allowed as the last filter in
|
|
the chain. It could be a little more correct to detect this file as
|
|
corrupt instead of unsupported, but saying it is unsupported is
|
|
simpler in case of liblzma.
|
|
|
|
unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
|
|
List of Filter Flags. LZMA2 is allowed only as the last filter in the
|
|
chain. It could be a little more correct to detect this file as
|
|
corrupt instead of unsupported, but saying it is unsupported is
|
|
simpler in case of liblzma.
|
|
|
|
|
|
2.3. Bad Files
|
|
|
|
bad-0pad-empty.xz has one Stream with no Blocks followed by
|
|
five-byte Stream Padding. Stream Padding must be a multiple of four
|
|
bytes, thus this file is corrupt.
|
|
|
|
bad-0catpad-empty.xz has two zero-Block Streams concatenated with
|
|
five-byte Stream Padding between the Streams.
|
|
|
|
bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
|
|
LZMA_Alone file.
|
|
|
|
bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
|
|
wrong in the Header Magic Bytes field of the second Stream. liblzma
|
|
gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
|
|
the first Stream of a file has invalid Header Magic Bytes.)
|
|
|
|
bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
|
|
in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
|
|
this.
|
|
|
|
bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
|
|
in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
|
|
this.
|
|
|
|
bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
|
|
of the file.
|
|
|
|
bad-0-nonempty_index.xz has no Blocks but Index claims that there is
|
|
one Block.
|
|
|
|
bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
|
|
|
|
bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
|
|
and Stream Footer.
|
|
|
|
bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
|
|
|
|
bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
|
|
|
|
bad-1-vli-1.xz has two-byte variable-length integer in the
|
|
Uncompressed Size field in Block Header while one-byte would be enough
|
|
for that value. It's important that the file gets rejected due to too
|
|
big integer encoding instead of due to Uncompressed Size not matching
|
|
the value stored in the Block Header. That is, the decoder must not
|
|
try to decode the Compressed Data field.
|
|
|
|
bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
|
|
Size in Block Header. It's important that the file gets rejected due
|
|
to too big integer encoding instead of due to Uncompressed Size not
|
|
matching the value stored in the Block Header. That is, the decoder
|
|
must not try to decode the Compressed Data field.
|
|
|
|
bad-1-block_header-1.xz has Block Header that ends in the middle of
|
|
the Filter Flags field.
|
|
|
|
bad-1-block_header-2.xz has Block Header that has Compressed Size and
|
|
Uncompressed Size but no List of Filter Flags field.
|
|
|
|
bad-1-block_header-3.xz has wrong CRC32 in Block Header.
|
|
|
|
bad-1-block_header-4.xz has too big Compressed Size in Block Header
|
|
(2^63 - 1 bytes while maximum is a little less, because the whole
|
|
Block must stay smaller than 2^63). It's important that the file
|
|
gets rejected due to invalid Compressed Size value; the decoder
|
|
must not try decoding the Compressed Data field.
|
|
|
|
bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
|
|
|
|
bad-1-block_header-6.xz has corrupt Block Header which may crash
|
|
xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
|
|
c0297445064951807803457dca1611b3c47e7f0f.
|
|
|
|
bad-2-index-1.xz has wrong Unpadded Sizes in Index.
|
|
|
|
bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
|
|
|
|
bad-2-index-3.xz has non-null byte in Index Padding.
|
|
|
|
bad-2-index-4.xz wrong CRC32 in Index.
|
|
|
|
bad-2-index-5.xz has zero as Unpadded Size. It is important that the
|
|
file gets rejected specifically due to Unpadded Size having an invalid
|
|
value.
|
|
|
|
bad-2-compressed_data_padding.xz has non-null byte in the padding of
|
|
the Compressed Data field of the first Block.
|
|
|
|
bad-1-check-crc32.xz has wrong Check (CRC32).
|
|
|
|
bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
|
|
Block Header but wrong Check (CRC32) in the actual data. This file
|
|
differs by one byte from good-1-block_header-1.xz: the last byte of
|
|
the Check field is wrong. This file is useful for testing error
|
|
detection in the threaded decoder when a worker thread is configured
|
|
to pass input one byte at a time to the Block decoder.
|
|
|
|
bad-1-check-crc64.xz has wrong Check (CRC64).
|
|
|
|
bad-1-check-sha256.xz has wrong Check (SHA-256).
|
|
|
|
bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
|
|
doesn't reset the dictionary.
|
|
|
|
bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
|
|
indicates dictionary reset, but the LZMA compressed data tries to
|
|
repeat data from the previous chunk.
|
|
|
|
bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
|
|
the middle of Block.
|
|
|
|
bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
|
|
uncompressed and the second is LZMA. The first chunk resets dictionary
|
|
as it should, but the second chunk tries to reset state without
|
|
specifying properties for LZMA.
|
|
|
|
bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
|
|
anything in the header of the second chunk.
|
|
|
|
bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
|
|
|
|
bad-1-lzma2-7.xz has EOPM at LZMA level.
|
|
|
|
bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
|
|
properties in the third LZMA2 chunk.
|
|
|
|
bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
|
|
a LZMA2 chunk (no end marker). The uncompressed size of the partial
|
|
LZMA2 stream exceeds the value stored in the Block Header.
|
|
|
|
bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
|
|
LZMA2 decoder, extends past the end of Block (and even the end of
|
|
the file). Uncompressed Size in Block Header is bigger than the
|
|
invalid LZMA2 stream may produce (even if a decoder reads until
|
|
the end of the file). The Check type is None to nullify certain
|
|
simple size-based sanity checks in a Block decoder.
|
|
|
|
bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
|
|
payload marker. When Compressed Size bytes have been decoded,
|
|
Uncompressed Size bytes of output will have been produced but
|
|
the LZMA2 decoder doesn't indicate end of stream.
|
|
|
|
|
|
3. Descriptions of Individual .lzma Files
|
|
|
|
3.1. Good Files
|
|
|
|
good-unknown_size-with_eopm.lzma has unknown size in the header
|
|
and end of payload marker at the end.
|
|
|
|
good-known_size-without_eopm.lzma has a known size in the header
|
|
and no end of payload marker at the end.
|
|
|
|
good-known_size-with_eopm.lzma has a known size in the header
|
|
and end of payload marker at the end. XZ Utils 5.2.5 and older
|
|
will give an error at the end of the file after producing the
|
|
correct uncompressed output.
|
|
|
|
|
|
3.2. Bad Files
|
|
|
|
bad-unknown_size-without_eopm.lzma has unknown size in the header
|
|
but no end of payload marker at the end. This file might be seen
|
|
by a decoder as if it were truncated.
|
|
|
|
bad-too_big_size-with_eopm.lzma has too big uncompressed size in
|
|
the header and the end of payload marker will be detected before
|
|
the specified number of bytes have been decoded.
|
|
|
|
bad-too_small_size-without_eopm-1.lzma has too small uncompressed
|
|
size in the header. The decoder will look for end of payload marker
|
|
but instead find a literal that would produce more output.
|
|
|
|
bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
|
|
of a literal the problem occurs with a short repeated match.
|
|
|
|
bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
|
|
of a literal the problem occurs in the middle of a match.
|
|
|