exepack

David Fifield <david@bamsoftware.com>

Last updated:

Source code tarball
exepack-0.6.0.tar.gz (sig)
Precompiled Windows executable
exepack.exe (sig)
Git repo
git clone https://www.bamsoftware.com/git/exepack.git

exepack is a program to compress and decompress 16-bit DOS executables with EXEPACK, a format for self-extracting executables.

Compression:
exepack unpacked.exe packed.exe
Decompression:
exepack -d packed.exe unpacked.exe
Use in a pipeline:
unzip -p comic.zip comic.exe | exepack -d /dev/stdin unpacked.exe

I wanted to reverse engineer some old DOS games like Mega Man and Captain Comic. These games' executables are packed with EXEPACK; you need to unpack them in order for the disassembly to make sense. I decided to write my own unpacker after encountering a file in a different EXEPACK format that other tools couldn't handle at the time.

Goals:

exepack is written in Rust. You need rustc and cargo to compile it.

Versions of the EXEPACK format

If you have a DOS EXE file that contains the string Packed file is corrupt, it is most likely packed with EXEPACK.

The most prominent documentation of EXEPACK online is at the DOS Game Modding Wiki: http://www.shikadi.net/moddingwiki/Microsoft_EXEPACK#File_Format. As of this writing, the format described there is just one of several slightly incompatible EXEPACK formats. The different versions differ in the size of the EXEPACK metadata header, whether they support an optional padding block, the size of their executable decompression stub, the localization of an error message, and the presence of certain bugs.

The general structure of an EXEPACK-packed file is:

┌──────────────────────────────────────┐   runtime address
│ EXE header                           │
├──────────────────────────────────────┤ ← ds:0100 = es:0100
│ compressed data                      │
├──────────────────────────────────────┤ 
│ optional skip_len padding            │
│ (only if EXEPACK header is 18 bytes) │
├──────────────────────────────────────┤ ← cs:0000
│ EXEPACK header (16 or 18 bytes)      │
├──────────────────────────────────────┤ ← cs:ip
│ EXEPACK decompression stub           │
├──────────────────────────────────────┤
│ packed relocation table              │
├──────────────────────────────────────┤ ← cs:exepack_size
┆ possible trailing garbage            ┆

Pointers use segment:offset notation: cs:ip means, as a linear address, 16×cs+ip.

The cs and ip fields in the EXE header tell us where to find the EXEPACK header and how big it is. There are two possible EXEPACK headers, a 16-byte one and an 18-byte one. They differ in the presence of a skip_len field.

16-byte header 18-byte header
uint16_t real_ip
uint16_t real_cs
uint16_t mem_start
uint16_t exepack_size
uint16_t real_sp
uint16_t real_ss
uint16_t dest_len
uint16_t signature "RB"
uint16_t real_ip
uint16_t real_cs
uint16_t mem_start
uint16_t exepack_size
uint16_t real_sp
uint16_t real_ss
uint16_t dest_len
uint16_t skip_len
uint16_t signature "RB"

The header field names are from ModdingWiki. mem_start is not an actual meaningful header field; it is just temporary storage used by the decompression stub. exepack_size is the size of the entire EXEPACK block: header, stub, and packed relocation table. dest_len should perhaps instead be called uncompressed_len: it's the size (in 16-byte paragraphs) of the uncompressed data. Similarly, cs could also be called compressed_len, because the compressed data ends just before the EXEPACK header. The only exception is when skip_len is present; in that case, uncompressed_len and compressed_len both get reduced by 16×(skip_len − 1). With the 16-byte header, it is as if skip_len always has the value 1; i.e., no skip_len padding.

(Aside: apart from complicating the unpacking algorithm, skip_len doesn't seem to serve any purpose. w4kfu found many executables with skip_len > 1, but they would work just as well with skip_len = 1.)

The decompression stub immediately follows the EXEPACK header. As it is located at cs:ip, it is the code that DOS will jump to as soon as the compressed executable is loaded. The stub is responsible for copying itself out of the way, decompressing the compressed data, and jumping to the entry point of the original uncompressed program. There have been several different decompression stubs over the years. The following table shows the characteristics of the ones that are known to me. See doc/README.stubs and doc/*.asm in the exepack source code for commented disassembly. The one with size 283 is the format documented at ModdingWiki. This program uses its own custom stub, designed to fix the problems of the other stubs, which keeping a size of 283 for compatibility with other external unpackers.

size skip_len? restores ax? A20 bug? relocation 0xffff bug? allows expansion? error string producer
258nonoyesyesnoPacked file is corruptEXEPACK 4.00
258nonoyesyesnoFichero corrompido    EXEPACK 4.00
279nonoyesnonoPacked file is corruptEXEPACK 4.03
277nonoyesnonoPacked file is corruptLINK /EXEPACK 3.60, 3.64, 3.65, 5.01.21
283yesnoyesnonoPacked file is corruptEXEPACK 4.05 or 4.06
290noyesnononoPacked file is corruptLINK /EXEPACK 3.69
283yesyesnonoyesPacked file is corruptexepack (this program)
size
Size of the decompression stub code, not counting the EXEPACK header or packed relocation table.
skip_len?
"no" means a 16-byte EXEPACK header without skip_len; "yes" means an 18-byte EXEPACK header with skip_len.
restores ax?
The state of most CPU registers is unspecified at startup; but ax has a meaning. The decompression stub should restore the original value of ax before jumping to the decompressed code, but most versions do not.
A20 bug?
"yes" means the stub relies on 8086-style 20-bit address wraparound; i.e., it requires the address fff0:0123 to map to the linear address 0x23, not 0x100000023. Stubs with this bug may falsely error out with "Packed file is corrupt" when run at a low address in memory.
relocation 0xffff bug?
The decompression stub has to apply relocations, by adding the program's starting segment to various 16-bit values in the program text. "yes" means the stub has a bug when patching a pointer at X:ffff: it will patch the bytes X:ffff and X:0000, instead of X:ffff and (X+1):0000.
allows expansion?
The standard stubs can't cope when the compressed program would be bigger than the original uncompressed program. The custom stub in exepack handles this case, which means you can, for example, recursively compress an executable 10 times and it will still run correctly.

An external unpacker like this one doesn't care about the contents of the decompression stub, but it has to know its length in order to locate the packed relocation table. There is no field that indicates where the stub ends and the relocations begin; it's implicit in the offsets encoded into the instructions of the stub. The error string Packed file is corrupt is a fairly reliable indicator: it always appears right at the end of the stub. However the message may be localized (Fichero corrompido    ), so it's not completely reliable. I initially tried having a table of known stubs, but later I changed it to instead search for the byte pattern that precedes the error message: cd 21 b8 ff 4c cd 21, which encodes the instructions int 0x21; mov ax, 0x4cff; int 0x21; then seek 22 bytes past the end of it. Since the error message seems to always be 22 bytes, this works. You can always check your guess after reading the packed relocation table; it should end exepack_size bytes after the beginning of the EXEPACK header.

After the stub comes the packed relocation table. Notionally, the relocation table is an array of segment:offset pointers. EXEPACK compresses the array by normalizing all the pointers to have a segment that is a multiple of 0x1000, and then storing 16 separate arrays containing offsets only. The first uint16_t is the number of offsets in the array for segment 0000, followed by that many uint16_ts for the offsets themselves; then a uint16_t for the number of offsets for segment 1000, followed by that many offsets; and so on up to segment f000.

I took the EXEPACK version numbers from the source code of UNP. The Detect-It-Easy software has signatures for versions of EXEPACK: EXEPACK.2.sg, WordPerfect EXEPack.2.sg. RGB Classic Games marks some versions as "2nd generation", but I don't know what their criteria for that are. If you have other samples of EXEPACK, please send them to me.

Unpacking algorithm

Taking the above observations into consideration, here is a rough algorithm for EXEPACK unpacking that is compatible with known formats. An implementation would have to deal with several possible error conditions, for example if skip_len > dest_len.

Decompression algorithm

The decompression algorithm is just as described at ModdingWiki. It runs backwards, and decompresses the buffer into itself. Here it is, with no error or bounds checking.

// The dst and src indices initially point one byte past the end
// of their respective regions.
decompress(buf, dst, src) {
	while (buf[src-1] == 0xff)
		src--;
	do {
		command = buf[--src];
		length = buf[--src];
		length = (length<<8) + buf[--src];
		switch (command & 0xfe) {
		case 0xb0:
			fill = buf[--src];
			for (i = 0; i < length; i++)
				buf[--dst] = fill;
			break;
		case 0xb2:
			for (i = 0; i < length; i++)
				buf[--dst] = buf[--src];
			break;
		default:
			error(); // Packed file is corrupt
		}
	} while ((command & 0x01) != 0x01);
}

Bugs

Send bug reports to david@bamsoftware.com. This is also where you should send files that seem to be EXEPACK-compressed but which this program cannot handle.

Open questions

The following are not bugs exactly, but rather questions that came up during development that I had to decide one way or another. I'm not sure that what I chose is the best way. If you have an opinion or insight, let me know.

Other EXEPACK decompressors

unEXEPACK

Single source file, written in C. Supports multiple formats. This is the alternative I recommend if exepack doesn't suit your needs.

UNEXEPACK

Single source file, written in C. Only supports one EXEPACK format. Has a bug when processing certain packed relocation tables.

UNP a.k.a. unp411

DOS-based unpacker for a ton of self-extracting executable formats. Written in assembly language. No longer maintained. The way it works is cute: it recognizes the input format, sets some breakpoints, runs the executable's own unpacking code, and copies the result out of memory. As a consequence, it really only works inside a real DOS environment. (And isn't safe to run on untrusted files.) I got the version numbers for different decompression stubs from UNP's labeled signatures in the source file exe/eexpk.asm.

Thanks