mirror of https://git.tukaani.org/xz.git
README
.lzma Test Files ---------------- 0. Introduction This directory contains bunch of files to test handling of .lzma files in .lzma decoder implementations. Many of the files have been created by hand with a hex editor, thus there is no better "source code" than the files themselves. All the test files (*.lzma) and this README have been put into the public domain. 1. File Types Good files (good-*.lzma) must decode successfully without requiring a lot of CPU time or RAM. If the decoder supports only Single-Block Streams, then good-multi-*.lzma won't decode, of course. Bad files (bad-*.lzma) must cause the decoder to give an error. Like with the good files, these files must not require a lot of CPU time or RAM before they get detected to be broken. Malicious files (malicious-*.lzma) are good in terms of the file format specification, but try to trigger excessive CPU, RAM or disk usage in the decoder. To prevent malicious files from putting the decoder in inifinite loop (*), eating all available RAM or disk space, decoders should have internal limitters that catch these situations. (*) Strictly speaking not infinite, but if decoding of a small file would take a few weeks or even years, it's an infinite loop in practice. 2. Descriptions of Individual Files 2.1. Good Files good-single-none.lzma uses implicit Copy filter with known Uncompressed Size. good-single-none-pad.lzma is good-single-none.lzma with Footer Padding. good-cat-single-none-pad.lzma is two good-single-none-pad.lzma files concatenated as is. Fully decoding this file requires that the decoder supports decoding concatenated files. good-single-subblock_implicit.lzma uses implicit Subblock filter. good-single-lzma.lzma is LZMA compressed file with EOPM. good-single-subblock-lzma.lzma has basic combination of Subblock and LZMA filters. good-single-none-empty_1.lzma is an empty file with implicit Copy filter and no integrity Check. good-single-none-empty_2.lzma is an empty file with implicit Copy filter and CRC32 as Check. good-single-none-empty_3.lzma is an empty file with implicit Copy filter, known Compressed Size, and no integrity Check. good-single-lzma-empty.lzma is an empty file with LZMA filter and no integrity Check. good-single-subblock_rle.lzma takes advantage of Subblock filter's run-length encoding. good-single-delta-lzma.tiff.lzma is an image file that compresses better with Delta+LZMA than with plain LZMA. good-single-lzma-flush_1.lzma has a flush marker in the middle of the file, and no EOPM. good-single-lzma-flush_2.lzma has a flush marker in the middle of the file and just before EOPM. 2.2. Bad Files bad-single-none-truncated.lzma is good-single-none.lzma without the last byte of the file. bad-cat-single-none-pad_garbage_1.lzma is good-cat-single-none-pad.lzma with 0xFE appended to the end of the file. 0xFE doesn't begin .lzma or LZMA_Alone format file. bad-cat-single-none-pad_garbage_2.lzma is good-cat-single-none-pad.lzma with 0xFF appended to the end of the file. 0xFF begins .lzma format file, thus the decoder has to detect that the file is incomplete. bad-cat-single-none-pad_garbage_3.lzma is good-cat-single-none-pad.lzma with 0x5D appended to the end of the file. 0x5D is the most common first byte of LZMA_Alone format file. bad-single-none-footer_filter_flags.lzma has different Stream Flags in Stream Footer than in Stream Header. bad-single-none-too_long_vli.lzma has 10-byte variable-length integer. bad-single-none-empty.lzma is like good-single-none-empty_3.lzma but with non-zero value in the Compressed Size field. bad-single-data_after_eopm_1.lzma has LZMA+Subblock, where the Subblock filter gives one byte of data to LZMA after LZMA has detected EOPM. bad-single-data_after_eopm_2.lzma is like bad-single-data_after_eopm_1.lzma but Subblock gives 256 MiB of data to LZMA after LZMA has detected EOPM. bad-single-subblock_subblock.lzma has Subblock+Subblock, where the Subblock decoder is given End of Input in the middle of a Subblock. bad-single-subblock-padding_loop.lzma contains huge amount of consecutive Padding bytes, which isn't allowed by the Subblock filter format. If it were allowed, this file would hang the decoder for very long time (weeks to years). bad-single-subblock1023-slow.lzma is similar to malicious-single-subblock31-slow.lzma except that this uses 1023 bytes of Padding in every place instead of 31 bytes. The Subblock filter format specification allows only 31-byte Padings, thus this file must get detected as bad without producing any output. Allowing larger Padding than 31 bytes was considered (so this test file was created), but it seemed to be a bad idea since it would increase worst-case CPU usage. bad-single-lzma-flush_beginning.lzma has flush marker in the beginning of the LZMA data. bad-single-lzma-flush_twice.lzma has two flush markers with no data between them. 2.3. Malicious Files malicious-single-subblock31-slow.lzma requires quite a bit of CPU time per decoded byte. It contains LZMA compressed Subblock filter data that has as much Padding as the specification allows. LZMA is also used as a Subfilter, to further slowdown the decoder. Every Subfilter instance produces only one byte of output. If you can create a file that wastes notably more CPU cycles than this file, please contact Lasse Collin. malicious-single-subblock-256MiB.lzma is a tiny file that produces 256 MiB of output. It uses Subblock filter's run-length encoding to achieve this. malicious-single-subblock-64PiB.lzma is a tiny file that produces 64 PiB of output (if you have patience to wait). This is done by chaining two Subblock filters and using their run-length encoders. malicious-multi-metadata-64PiB.lzma is like malicious-single-subblock-64PiB.lzma but the huge amount of output is in a Metadata Block. Trying to decode this file may take years unless the decoder catches that the Metadata has unreasonable size.