mirror of https://git.tukaani.org/xz.git
113 lines
4.3 KiB
Plaintext
113 lines
4.3 KiB
Plaintext
|
|
Hacking liblzma
|
|
---------------
|
|
|
|
0. Preface
|
|
|
|
This document gives some overall information about the internals of
|
|
liblzma, which should make it easier to start reading and modifying
|
|
the code.
|
|
|
|
|
|
1. Programming language
|
|
|
|
liblzma was written in C99. If you use GCC, this means that you need
|
|
at least GCC 3.x.x. GCC 2 isn't and won't be supported.
|
|
|
|
Some GCC-specific extensions are used *conditionally*. They aren't
|
|
required to build a full-featured library. Don't make the code rely
|
|
on any non-standard compiler extensions or even C99 features that
|
|
aren't portable between almost-C99 compatible compilers (for example
|
|
non-static inlines).
|
|
|
|
The public API headers are in C89. This is to avoid frustrating those
|
|
who maintain programs, which are strictly in C89 or C++.
|
|
|
|
An assumption about sizeof(size_t) is made. If this assumption is
|
|
wrong, some porting is probably needed:
|
|
|
|
sizeof(uint32_t) <= sizeof(size_t) <= sizeof(uint64_t)
|
|
|
|
|
|
2. Internal vs. external API
|
|
|
|
|
|
|
|
Input Output
|
|
v Application ^
|
|
| liblzma public API |
|
|
| Stream coder |
|
|
| Block coder |
|
|
| Filter coder |
|
|
| ... |
|
|
v Filter coder ^
|
|
|
|
|
|
Application
|
|
`-- liblzma public API
|
|
`-- Stream coder
|
|
|-- Stream info handler
|
|
|-- Stream Header coder
|
|
|-- Block Header coder
|
|
| `-- Filter Flags coder
|
|
|-- Metadata coder
|
|
| `-- Block coder
|
|
| `-- Filter 0
|
|
| `-- Filter 1
|
|
| ...
|
|
|-- Data Block coder
|
|
| `-- Filter 0
|
|
| `-- Filter 1
|
|
| ...
|
|
`-- Stream tail coder
|
|
|
|
|
|
|
|
x. Designing new filters
|
|
|
|
All filters must be designed so that the decoder cannot consume
|
|
arbitrary amount input without producing any decoded output. Failing
|
|
to follow this rule makes liblzma vulnerable to DoS attacks if
|
|
untrusted files are decoded (usually they are untrusted).
|
|
|
|
An example should clarify the reason behind this requirement: There
|
|
are two filters in the chain. The decoder of the first filter produces
|
|
huge amount of output (many gigabytes or more) with a few bytes of
|
|
input, which gets passed to the decoder of the second filter. If the
|
|
data passed to the second filter is interpreted as something that
|
|
produces no output (e.g. padding), the filter chain as a whole
|
|
produces no output and consumes no input for a long period of time.
|
|
|
|
The above problem was present in the first versions of the Subblock
|
|
filter. A tiny .lzma file could have taken several years to decode
|
|
while it wouldn't produce any output at all. The problem was fixed
|
|
by adding limits for number of consecutive Padding bytes, and requiring
|
|
that some decoded output must be produced between Set Subfilter and
|
|
Unset Subfilter.
|
|
|
|
|
|
x. Implementing new filters
|
|
|
|
If the filter supports embedding End of Payload Marker, make sure that
|
|
when your filter detects End of Payload Marker,
|
|
- the usage of End of Payload Marker is actually allowed (i.e. End
|
|
of Input isn't used); and
|
|
- it also checks that there is no more input coming from the next
|
|
filter in the chain.
|
|
|
|
The second requirement is slightly tricky. It's possible that the next
|
|
filter hasn't returned LZMA_STREAM_END yet. It may even need a few
|
|
bytes more input before it will do so. You need to give it as much
|
|
input as it needs, and verify that it doesn't produce any output.
|
|
|
|
Don't call the next filter in the chain after it has returned
|
|
LZMA_STREAM_END (except in encoder if action == LZMA_SYNC_FLUSH).
|
|
It will result undefined behavior.
|
|
|
|
Be pedantic. If the input data isn't exactly valid, reject it.
|
|
|
|
At the moment, liblzma isn't modular. You will need to edit several
|
|
files in src/liblzma/common to include support for a new filter. grep
|
|
for LZMA_FILTER_LZMA to locate the files needing changes.
|
|
|