mirror of
				https://git.tukaani.org/xz.git
				synced 2025-11-03 23:12:57 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			410 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			410 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
 | 
						|
.xz and .lzma Test Files
 | 
						|
------------------------
 | 
						|
 | 
						|
0. Introduction
 | 
						|
 | 
						|
    This directory contains bunch of files to test handling of .xz,
 | 
						|
    .lzma (LZMA_Alone), and .lz (lzip) files in decoder implementations.
 | 
						|
    Many of the files have been created by hand with a hex editor, thus
 | 
						|
    there is no better "source code" than the files themselves. All the
 | 
						|
    test files and this README have been put into the public domain.
 | 
						|
 | 
						|
 | 
						|
1. File Types
 | 
						|
 | 
						|
    Good files (good-*) must decode successfully without requiring
 | 
						|
    a lot of CPU time or RAM.
 | 
						|
 | 
						|
    Unsupported files (unsupported-*) are good files, but headers
 | 
						|
    indicate features not supported by the current file format
 | 
						|
    specification.
 | 
						|
 | 
						|
    Bad files (bad-*) must cause the decoder to give an error. Like
 | 
						|
    with the good files, these files must not require a lot of CPU
 | 
						|
    time or RAM before they get detected to be broken.
 | 
						|
 | 
						|
 | 
						|
2. Descriptions of Individual .xz Files
 | 
						|
 | 
						|
2.1. Good Files
 | 
						|
 | 
						|
    good-0-empty.xz has one Stream with no Blocks.
 | 
						|
 | 
						|
    good-0pad-empty.xz has one Stream with no Blocks followed by
 | 
						|
    four-byte Stream Padding.
 | 
						|
 | 
						|
    good-0cat-empty.xz has two zero-Block Streams concatenated without
 | 
						|
    Stream Padding.
 | 
						|
 | 
						|
    good-0catpad-empty.xz has two zero-Block Streams concatenated with
 | 
						|
    four-byte Stream Padding between the Streams.
 | 
						|
 | 
						|
    good-1-check-none.xz has one Stream with one Block with two
 | 
						|
    uncompressed LZMA2 chunks and no integrity check.
 | 
						|
 | 
						|
    good-1-check-crc32.xz has one Stream with one Block with two
 | 
						|
    uncompressed LZMA2 chunks and CRC32 check.
 | 
						|
 | 
						|
    good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
 | 
						|
 | 
						|
    good-1-check-sha256.xz is like good-1-check-crc32.xz but with
 | 
						|
    SHA256.
 | 
						|
 | 
						|
    good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
 | 
						|
    LZMA2 chunk in each Block.
 | 
						|
 | 
						|
    good-1-block_header-1.xz has both Compressed Size and Uncompressed
 | 
						|
    Size in the Block Header. This has also four extra bytes of Header
 | 
						|
    Padding.
 | 
						|
 | 
						|
    good-1-block_header-2.xz has known Compressed Size.
 | 
						|
 | 
						|
    good-1-block_header-3.xz has known Uncompressed Size.
 | 
						|
 | 
						|
    good-1-delta-lzma2.tiff.xz is an image file that compresses
 | 
						|
    better with Delta+LZMA2 than with plain LZMA2.
 | 
						|
 | 
						|
    good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
 | 
						|
    uncompressed file is compress_prepared_bcj_x86 found from the tests
 | 
						|
    directory.
 | 
						|
 | 
						|
    good-1-sparc-lzma2.xz uses the SPARC filter and LZMA2. The
 | 
						|
    uncompressed file is compress_prepared_bcj_sparc found from the tests
 | 
						|
    directory.
 | 
						|
 | 
						|
    good-1-arm64-lzma2-1.xz uses the ARM64 filter and LZMA2. The
 | 
						|
    uncompressed data is constructed so that it tests integer
 | 
						|
    wrap around and sign extension.
 | 
						|
 | 
						|
    good-1-arm64-lzma2-2.xz is like good-1-arm64-lzma2-1.xz but with
 | 
						|
    non-zero start offset. XZ Embedded doesn't support this file.
 | 
						|
 | 
						|
    good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
 | 
						|
    new properties.
 | 
						|
 | 
						|
    good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
 | 
						|
    the state without specifying new properties.
 | 
						|
 | 
						|
    good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
 | 
						|
    uncompressed and the second is LZMA. The first chunk resets dictionary
 | 
						|
    and the second sets new properties.
 | 
						|
 | 
						|
    good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
 | 
						|
    uncompressed with dictionary reset, and third is LZMA with new
 | 
						|
    properties but without dictionary reset.
 | 
						|
 | 
						|
    good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
 | 
						|
    payload marker. XZ Utils 5.0.1 and older incorrectly see this file
 | 
						|
    as corrupt.
 | 
						|
 | 
						|
    good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
 | 
						|
 | 
						|
    good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
 | 
						|
    and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
 | 
						|
    return LZMA_BUF_ERROR in some cases. See commit message
 | 
						|
    d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
 | 
						|
 | 
						|
 | 
						|
2.2. Unsupported Files
 | 
						|
 | 
						|
    unsupported-check.xz uses Check ID 0x02 which isn't supported by
 | 
						|
    the current version of the file format. It is implementation-defined
 | 
						|
    how this file handled (it may reject it, or decode it possibly with
 | 
						|
    a warning).
 | 
						|
 | 
						|
    unsupported-block_header.xz has a non-null byte in Header Padding,
 | 
						|
    which may indicate presence of a new unsupported field.
 | 
						|
 | 
						|
    unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
 | 
						|
 | 
						|
    unsupported-filter_flags-2.xz specifies only Delta filter in the
 | 
						|
    List of Filter Flags, but Delta isn't allowed as the last filter in
 | 
						|
    the chain. It could be a little more correct to detect this file as
 | 
						|
    corrupt instead of unsupported, but saying it is unsupported is
 | 
						|
    simpler in case of liblzma.
 | 
						|
 | 
						|
    unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
 | 
						|
    List of Filter Flags. LZMA2 is allowed only as the last filter in the
 | 
						|
    chain. It could be a little more correct to detect this file as
 | 
						|
    corrupt instead of unsupported, but saying it is unsupported is
 | 
						|
    simpler in case of liblzma.
 | 
						|
 | 
						|
 | 
						|
2.3. Bad Files
 | 
						|
 | 
						|
    bad-0pad-empty.xz has one Stream with no Blocks followed by
 | 
						|
    five-byte Stream Padding. Stream Padding must be a multiple of four
 | 
						|
    bytes, thus this file is corrupt.
 | 
						|
 | 
						|
    bad-0catpad-empty.xz has two zero-Block Streams concatenated with
 | 
						|
    five-byte Stream Padding between the Streams.
 | 
						|
 | 
						|
    bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
 | 
						|
    LZMA_Alone file.
 | 
						|
 | 
						|
    bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
 | 
						|
    wrong in the Header Magic Bytes field of the second Stream. liblzma
 | 
						|
    gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
 | 
						|
    the first Stream of a file has invalid Header Magic Bytes.)
 | 
						|
 | 
						|
    bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
 | 
						|
    in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
 | 
						|
    this.
 | 
						|
 | 
						|
    bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
 | 
						|
    in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
 | 
						|
    this.
 | 
						|
 | 
						|
    bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
 | 
						|
    of the file.
 | 
						|
 | 
						|
    bad-0-nonempty_index.xz has no Blocks but Index claims that there is
 | 
						|
    one Block.
 | 
						|
 | 
						|
    bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
 | 
						|
 | 
						|
    bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
 | 
						|
    and Stream Footer.
 | 
						|
 | 
						|
    bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
 | 
						|
 | 
						|
    bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
 | 
						|
 | 
						|
    bad-1-vli-1.xz has two-byte variable-length integer in the
 | 
						|
    Uncompressed Size field in Block Header while one-byte would be enough
 | 
						|
    for that value. It's important that the file gets rejected due to too
 | 
						|
    big integer encoding instead of due to Uncompressed Size not matching
 | 
						|
    the value stored in the Block Header. That is, the decoder must not
 | 
						|
    try to decode the Compressed Data field.
 | 
						|
 | 
						|
    bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
 | 
						|
    Size in Block Header. It's important that the file gets rejected due
 | 
						|
    to too big integer encoding instead of due to Uncompressed Size not
 | 
						|
    matching the value stored in the Block Header. That is, the decoder
 | 
						|
    must not try to decode the Compressed Data field.
 | 
						|
 | 
						|
    bad-1-block_header-1.xz has Block Header that ends in the middle of
 | 
						|
    the Filter Flags field.
 | 
						|
 | 
						|
    bad-1-block_header-2.xz has Block Header that has Compressed Size and
 | 
						|
    Uncompressed Size but no List of Filter Flags field.
 | 
						|
 | 
						|
    bad-1-block_header-3.xz has wrong CRC32 in Block Header.
 | 
						|
 | 
						|
    bad-1-block_header-4.xz has too big Compressed Size in Block Header
 | 
						|
    (2^63 - 1 bytes while maximum is a little less, because the whole
 | 
						|
    Block must stay smaller than 2^63). It's important that the file
 | 
						|
    gets rejected due to invalid Compressed Size value; the decoder
 | 
						|
    must not try decoding the Compressed Data field.
 | 
						|
 | 
						|
    bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
 | 
						|
 | 
						|
    bad-1-block_header-6.xz has corrupt Block Header which may crash
 | 
						|
    xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
 | 
						|
    c0297445064951807803457dca1611b3c47e7f0f.
 | 
						|
 | 
						|
    bad-2-index-1.xz has wrong Unpadded Sizes in Index.
 | 
						|
 | 
						|
    bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
 | 
						|
 | 
						|
    bad-2-index-3.xz has non-null byte in Index Padding.
 | 
						|
 | 
						|
    bad-2-index-4.xz wrong CRC32 in Index.
 | 
						|
 | 
						|
    bad-2-index-5.xz has zero as Unpadded Size. It is important that the
 | 
						|
    file gets rejected specifically due to Unpadded Size having an invalid
 | 
						|
    value.
 | 
						|
 | 
						|
    bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
 | 
						|
    fields have huge values whose sum exceeds the maximum allowed size
 | 
						|
    of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
 | 
						|
    lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
 | 
						|
    check for the uncompressed size and thus doesn't catch the error
 | 
						|
    when decoding the Index field in this file. This makes "xz -l"
 | 
						|
    not detect the error and will display 0 as the uncompressed size.
 | 
						|
    Note that regular decompression isn't affected by this bug because
 | 
						|
    it uses lzma_index_hash_append() instead.
 | 
						|
 | 
						|
    bad-2-compressed_data_padding.xz has non-null byte in the padding of
 | 
						|
    the Compressed Data field of the first Block.
 | 
						|
 | 
						|
    bad-1-check-crc32.xz has wrong Check (CRC32).
 | 
						|
 | 
						|
    bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
 | 
						|
    Block Header but wrong Check (CRC32) in the actual data. This file
 | 
						|
    differs by one byte from good-1-block_header-1.xz: the last byte of
 | 
						|
    the Check field is wrong. This file is useful for testing error
 | 
						|
    detection in the threaded decoder when a worker thread is configured
 | 
						|
    to pass input one byte at a time to the Block decoder.
 | 
						|
 | 
						|
    bad-1-check-crc64.xz has wrong Check (CRC64).
 | 
						|
 | 
						|
    bad-1-check-sha256.xz has wrong Check (SHA-256).
 | 
						|
 | 
						|
    bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
 | 
						|
    doesn't reset the dictionary.
 | 
						|
 | 
						|
    bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
 | 
						|
    indicates dictionary reset, but the LZMA compressed data tries to
 | 
						|
    repeat data from the previous chunk.
 | 
						|
 | 
						|
    bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
 | 
						|
    the middle of Block.
 | 
						|
 | 
						|
    bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
 | 
						|
    uncompressed and the second is LZMA. The first chunk resets dictionary
 | 
						|
    as it should, but the second chunk tries to reset state without
 | 
						|
    specifying properties for LZMA.
 | 
						|
 | 
						|
    bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
 | 
						|
    anything in the header of the second chunk.
 | 
						|
 | 
						|
    bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
 | 
						|
 | 
						|
    bad-1-lzma2-7.xz has EOPM at LZMA level.
 | 
						|
 | 
						|
    bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
 | 
						|
    properties in the third LZMA2 chunk.
 | 
						|
 | 
						|
    bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
 | 
						|
    a LZMA2 chunk (no end marker). The uncompressed size of the partial
 | 
						|
    LZMA2 stream exceeds the value stored in the Block Header.
 | 
						|
 | 
						|
    bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
 | 
						|
    LZMA2 decoder, extends past the end of Block (and even the end of
 | 
						|
    the file). Uncompressed Size in Block Header is bigger than the
 | 
						|
    invalid LZMA2 stream may produce (even if a decoder reads until
 | 
						|
    the end of the file). The Check type is None to nullify certain
 | 
						|
    simple size-based sanity checks in a Block decoder.
 | 
						|
 | 
						|
    bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
 | 
						|
    payload marker. When Compressed Size bytes have been decoded,
 | 
						|
    Uncompressed Size bytes of output will have been produced but
 | 
						|
    the LZMA2 decoder doesn't indicate end of stream.
 | 
						|
 | 
						|
 | 
						|
3. Descriptions of Individual .lzma Files
 | 
						|
 | 
						|
3.1. Good Files
 | 
						|
 | 
						|
    good-unknown_size-with_eopm.lzma has unknown size in the header
 | 
						|
    and end of payload marker at the end.
 | 
						|
 | 
						|
    good-known_size-without_eopm.lzma has a known size in the header
 | 
						|
    and no end of payload marker at the end.
 | 
						|
 | 
						|
    good-known_size-with_eopm.lzma has a known size in the header
 | 
						|
    and end of payload marker at the end. XZ Utils 5.2.5 and older
 | 
						|
    will give an error at the end of the file after producing the
 | 
						|
    correct uncompressed output.
 | 
						|
 | 
						|
 | 
						|
3.2. Bad Files
 | 
						|
 | 
						|
    bad-unknown_size-without_eopm.lzma has unknown size in the header
 | 
						|
    but no end of payload marker at the end. This file might be seen
 | 
						|
    by a decoder as if it were truncated.
 | 
						|
 | 
						|
    bad-too_big_size-with_eopm.lzma has too big uncompressed size in
 | 
						|
    the header and the end of payload marker will be detected before
 | 
						|
    the specified number of bytes have been decoded.
 | 
						|
 | 
						|
    bad-too_small_size-without_eopm-1.lzma has too small uncompressed
 | 
						|
    size in the header. The decoder will look for end of payload marker
 | 
						|
    but instead find a literal that would produce more output.
 | 
						|
 | 
						|
    bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
 | 
						|
    of a literal the problem occurs with a short repeated match.
 | 
						|
 | 
						|
    bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
 | 
						|
    of a literal the problem occurs in the middle of a match.
 | 
						|
 | 
						|
 | 
						|
4. Descriptions of Individual .lz (lzip) Files
 | 
						|
 | 
						|
4.1. Good Files
 | 
						|
 | 
						|
    good-1-v0.lz contains a single version 0 member. lzip 1.17 and
 | 
						|
    *older* can decompress this; support for version 0 was removed
 | 
						|
    in lzip 1.18.
 | 
						|
 | 
						|
    good-1-v0-trailing-1.lz is like good-1-v0.lz but contains
 | 
						|
    trailing data that the decompressor must ignore.
 | 
						|
 | 
						|
    good-1-v1.lz contains a single version 1 member. lzip 1.3 and
 | 
						|
    newer can decompress this.
 | 
						|
 | 
						|
    good-1-v1-trailing-1.lz is like good-1-v1.lz but contains
 | 
						|
    trailing data that the decompressor must ignore.
 | 
						|
 | 
						|
    good-1-v1-trailing-2.lz is like good-1-v1.lz but contains
 | 
						|
    trailing data whose first three bytes match the .lz magic bytes.
 | 
						|
    With lzip >= 1.20 this file results in an error unless one uses
 | 
						|
    the command line option --loose-trailing. lzip 1.3 to 1.19 decode
 | 
						|
    this file successfully by default. XZ Utils uses the old behavior
 | 
						|
    because it allows lzma_code() to stop at the first byte of the
 | 
						|
    trailing data as long as the first byte isn't 0x4C (L in US-ASCII);
 | 
						|
    otherwise the first 1-3 bytes that equal to the magic bytes are
 | 
						|
    consumed and lost in lzma_code(), and this is visible in xz too:
 | 
						|
 | 
						|
        $ ( xz -dc ; cat ) < good-1-v1-trailing-2.lz
 | 
						|
        Hello
 | 
						|
        World!
 | 
						|
        Trailing garbage
 | 
						|
 | 
						|
        $ ( xz -dc --single-stream ; cat ) < good-1-v1-trailing-2.lz
 | 
						|
        Hello
 | 
						|
        World!
 | 
						|
        LZITrailing garbage
 | 
						|
 | 
						|
    good-2-v0-v1.lz contains two members of which the first is
 | 
						|
    version 0 and the second version 1. lzip versions 1.3 to 1.17
 | 
						|
    (inclusive) can decompress this.
 | 
						|
 | 
						|
    good-2-v1-v0.lz contains two members of which the first is
 | 
						|
    version 1 and the second version 0. lzip versions 1.3 to 1.17
 | 
						|
    (inclusive) can decompress this.
 | 
						|
 | 
						|
    good-2-v1-v1.lz contains two version 1 members. lzip versions 1.3
 | 
						|
    and newer can decompress this.
 | 
						|
 | 
						|
 | 
						|
4.2. Unsupported Files
 | 
						|
 | 
						|
    unsupported-1-v234.lz is like good-1-v1.lz except the version
 | 
						|
    field has been set to 234 (0xEA) which, as of writing, isn't
 | 
						|
    defined or supported by any .lz implementation.
 | 
						|
 | 
						|
 | 
						|
4.3. Bad Files
 | 
						|
 | 
						|
    bad-1-v1-magic-1.lz is like good-1-v1.lz but the first magic byte
 | 
						|
    is wrong.
 | 
						|
 | 
						|
    bad-1-v1-magic-2.lz is like good-1-v1.lz but the last (fourth)
 | 
						|
    magic byte is wrong.
 | 
						|
 | 
						|
    bad-1-v1-dict-1.lz has too low value in the dictionary size field.
 | 
						|
 | 
						|
    bad-1-v1-dict-2.lz has too high value in the dictionary size field.
 | 
						|
 | 
						|
    bad-1-v1-crc32.lz has wrong CRC32 value.
 | 
						|
 | 
						|
    bad-1-v0-uncomp-size.lz is version 0 format with incorrect value
 | 
						|
    in the uncompressed size field.
 | 
						|
 | 
						|
    bad-1-v1-uncomp-size.lz is version 1 format with incorrect value
 | 
						|
    in the uncompressed size field.
 | 
						|
 | 
						|
    bad-1-v1-member-size.lz has incorrect value in the member size
 | 
						|
    field.
 | 
						|
 | 
						|
    bad-1-v1-trailing-magic.lz has the four .lz magic bytes as trailing
 | 
						|
    data. This should be detected as a truncated file and thus result
 | 
						|
    in an error. That is, the last four bytes of the file should not be
 | 
						|
    ignored as trailing garbage. lzip >= 1.18 matches this behavior
 | 
						|
    while older versions ignore the last four bytes and don't indicate
 | 
						|
    an error.
 | 
						|
 |