xz/tests/files/README

273 lines
11 KiB
Plaintext

.lzma Test Files
----------------
0. Introduction
This directory contains bunch of files to test handling of .lzma files
in .lzma decoder implementations. Many of the files have been created
by hand with a hex editor, thus there is no better "source code" than
the files themselves. All the test files (*.lzma) and this README have
been put into the public domain.
1. File Types
Good files (good-*.lzma) must decode successfully without requiring
a lot of CPU time or RAM. If the decoder supports only Single-Block
Streams, then good-multi-*.lzma won't decode, of course.
Bad files (bad-*.lzma) must cause the decoder to give an error. Like
with the good files, these files must not require a lot of CPU time
or RAM before they get detected to be broken.
Malicious files (malicious-*.lzma) are good in terms of the file format
specification, but try to trigger excessive CPU, RAM or disk usage in
the decoder. To prevent malicious files from putting the decoder in
inifinite loop (*), eating all available RAM or disk space, decoders
should have internal limitters that catch these situations.
(*) Strictly speaking not infinite, but if decoding of a small file
would take a few weeks or even years, it's an infinite loop in
practice.
2. Descriptions of Individual Files
2.1. Good Files
good-single-none.lzma uses implicit Copy filter with known Uncompressed
Size.
good-single-none-pad.lzma is good-single-none.lzma with Footer Padding.
good-cat-single-none-pad.lzma is two good-single-none-pad.lzma files
concatenated as is. Fully decoding this file requires that the decoder
supports decoding concatenated files.
good-single-subblock_implicit.lzma uses implicit Subblock filter.
good-single-lzma.lzma is LZMA compressed file with EOPM.
good-single-subblock-lzma.lzma has basic combination of Subblock and
LZMA filters.
good-single-none-empty_1.lzma is an empty file with implicit Copy
filter and no integrity Check.
good-single-none-empty_2.lzma is an empty file with implicit Copy
filter and CRC32 as Check.
good-single-none-empty_3.lzma is an empty file with implicit Copy
filter, known Compressed Size, and no integrity Check.
good-single-lzma-empty.lzma is an empty file with LZMA filter and no
integrity Check.
good-single-subblock_rle.lzma takes advantage of Subblock filter's
run-length encoding.
good-single-delta-lzma.tiff.lzma is an image file that compresses
better with Delta+LZMA than with plain LZMA.
good-single-x86-lzma.lzma uses the x86 filter (BCJ) and LZMA. The
uncompressed file is compress_prepared_bcj_x86 found from the tests
directory.
good-single-sparc-lzma.lzma uses the SPARC filter and LZMA. The
uncompressed file is compress_prepared_bcj_sparc found from the tests
directory.
good-single-lzma-flush_1.lzma has a flush marker in the middle of
the file, and no EOPM.
good-single-lzma-flush_2.lzma has a flush marker in the middle of
the file and just before EOPM.
good-multi-none-1.lzma is a basic Multi-Block Stream with two Data
Blocks and Footer Metadata Block.
good-multi-none-2.lzma is good-multi-none-1.lzma with Total Size and
Uncompressed Size added to the Footer Metadata Block.
good-multi-none-extra_1.lzma has the `Extra is present' flag set but
no actual Extra Records.
good-multi-none-extra_2.lzma has two non-empty Extra Records.
good-multi-none-extra_3.lzma has an Extra Record that has empty Data.
good-multi-none-header_1.lzma has very minimal Header Metadata Block
with only the Metadata Flags field.
good-multi-none-header_2.lzma has all information in both Header and
Footer Metadata Blocks. The Size of Header Metadata Block has wrong
value in Header Metadata Block, but this value must be ignored by
the decoder in case of Header Metadata Block.
good-multi-none-header_3.lzma has Index only in the Header Metadata
Block. Footer Metadata Block contains only Size of Header Metadata
Block and Total Size.
good-multi-none-block_1.lzma has Index in Header Metadata Block. The
Compressed Size and Uncompressed Size fields are present in the Data
Blocks. There is some Footer Padding between the Blocks.
good-multi-none-block_2.lzma has Index in Header Metadata Block. The
Uncompressed Size field is present in Data Blocks and no EOPM is used.
2.2. Bad Files
bad-single-none-truncated.lzma is good-single-none.lzma without the
last byte of the file.
bad-cat-single-none-pad_garbage_1.lzma is good-cat-single-none-pad.lzma
with 0xFE appended to the end of the file. 0xFE doesn't begin .lzma
or LZMA_Alone format file.
bad-cat-single-none-pad_garbage_2.lzma is good-cat-single-none-pad.lzma
with 0xFF appended to the end of the file. 0xFF begins .lzma format
file, thus the decoder has to detect that the file is incomplete.
bad-cat-single-none-pad_garbage_3.lzma is good-cat-single-none-pad.lzma
with 0x5D appended to the end of the file. 0x5D is the most common
first byte of LZMA_Alone format file.
bad-single-none-footer_filter_flags.lzma has different Stream Flags
in Stream Footer than in Stream Header.
bad-single-none-too_long_vli.lzma has 10-byte variable-length integer.
bad-single-none-empty.lzma is like good-single-none-empty_3.lzma but
with non-zero value in the Compressed Size field.
bad-single-data_after_eopm_1.lzma has LZMA+Subblock, where the Subblock
filter gives one byte of data to LZMA after LZMA has detected EOPM.
bad-single-data_after_eopm_2.lzma is like
bad-single-data_after_eopm_1.lzma but Subblock gives 256 MiB of data
to LZMA after LZMA has detected EOPM.
bad-single-subblock_subblock.lzma has Subblock+Subblock, where the
Subblock decoder is given End of Input in the middle of a Subblock.
bad-single-subblock-padding_loop.lzma contains huge amount of
consecutive Padding bytes, which isn't allowed by the Subblock filter
format. If it were allowed, this file would hang the decoder for very
long time (weeks to years).
bad-single-subblock1023-slow.lzma is similar to
malicious-single-subblock31-slow.lzma except that this uses 1023 bytes
of Padding in every place instead of 31 bytes. The Subblock filter
format specification allows only 31-byte Padings, thus this file must
get detected as bad without producing any output. Allowing larger
Padding than 31 bytes was considered (so this test file was created),
but it seemed to be a bad idea since it would increase worst-case CPU
usage.
bad-single-lzma-flush_beginning.lzma has flush marker in the beginning
of the LZMA data.
bad-single-lzma-flush_twice.lzma has two flush markers with no data
between them.
bad-multi-none-1.lzma has data after the last field in the Metadata
Block and the `Extra is present' flag is not set.
bad-multi-none-2.lzma has wrong Total Size in Footer Metadata Block.
bad-multi-none-3.lzma has wrong Uncompressed Size in Footer Metadata
Block.
bad-multi-none-index_1.lzma has wrong value in the Number of Data
Blocks field.
bad-multi-none-index_2.lzma has too short Metadata to contain all
the Index Records.
bad-multi-none-index_3.lzma has wrong value in Total Size field in
the Index.
bad-multi-none-index_4.lzma has wrong value in Uncompressed Size field
in the Index.
bad-multi-none-extra_1.lzma has incomplete Extra Record at the end of
the Metadata Block.
bad-multi-none-extra_2.lzma has incomplete variable-length integer as
Extra Record ID.
bad-multi-none-extra_3.lzma has incomplete Extra Record at the end of
the Metadata Block.
bad-multi-none-header_1.lzma has empty Header Metadata Block (even
the Metadata Flags field is not present).
bad-multi-none-header_2.lzma has Index in the Header Metadata Block,
which describes only one Data Block, while the Stream actually has
two Data Blocks. A sophisticated decoder should give an error when
it detects the second Data Block; all Multi-Block decoders must
detect the file as corrupt at some point.
bad-multi-none-header_3.lzma contains too small Total Size in Header
Metadata Block. A sophisticated decoder should abort decoding before
the second Data Block, preferably before the first Data Block has
been finished; all Multi-Block decoders must detect the file as
corrupt at some point.
bad-multi-none-header_4.lzma is like bad-multi-none-header_3.lzma but
with too small Uncompressed Size.
bad-multi-none-header_5.lzma has Index in the Header Metadata Block,
but the Total Size field is missing from the Footer Metadata Block.
bad-multi-none-header_6.lzma has both Index and Total Size in Header
Metadata Block, but Total Size doesn't match the Index. A sophisticated
decoder should abort before decoding any Data Blocks; all Multi-Block
decoders must detect the file as corrupt at some point.
bad-multi-none-header_7.lzma has zero as the Size of Header Metadata
Block in the Header Metadata Block.
bad-multi-none-block_1.lzma has wrong Uncompressed Size in the first
Data Block. A sophisticated decoder should detect this error before
producing any output, because it can see that the Uncompressed Size
doesn't match with the Index in Header Metadata Block; all Multi-Block
decoders must detect the file as corrupt at some point.
bad-multi-none-block_2.lzma has too big Compressed Size in the first
Data Block. A sophisticated decoder may be able to detect the file as
corrupt before producing any output, because Comrpessed Size + size
of Block Header exceed the Total Size stored in Index in Header
Metadata Block. A sophisticated decoder should be able to detect the
error before the end of the first Data Block; all Multi-Block decoders
must detect the file as corrupt at some point.
bad-multi-none-block_3.lzma has only the Compressed Size field in the
Block Header of the second Data Block and EOPM isn't used.
2.3. Malicious Files
malicious-single-subblock31-slow.lzma requires quite a bit of CPU time
per decoded byte. It contains LZMA compressed Subblock filter data that
has as much Padding as the specification allows. LZMA is also used as
a Subfilter, to further slowdown the decoder. Every Subfilter instance
produces only one byte of output. If you can create a file that wastes
notably more CPU cycles than this file, please contact Lasse Collin.
malicious-single-subblock-256MiB.lzma is a tiny file that produces
256 MiB of output. It uses Subblock filter's run-length encoding
to achieve this.
malicious-single-subblock-64PiB.lzma is a tiny file that produces
64 PiB of output (if you have patience to wait). This is done by
chaining two Subblock filters and using their run-length encoders.
malicious-multi-metadata-64PiB.lzma is like
malicious-single-subblock-64PiB.lzma but the huge amount of output
is in a Metadata Block. Trying to decode this file may take years
unless the decoder catches that the Metadata has unreasonable size.