Skip to content
View in the app

A better way to browse. Learn more.

ResHax

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Help us keep the site running.

Reverse engineering the extraction of Game Archives using QuickBMS/Python

Featured Replies

  • Localization

Today I am gonna discuss on how we can reverse engineer the extraction of the game archives, sit back because this is where it starts to get interesting...

 

+==== TUTORIAL SECTION ====+

But how do those files store game assets like 3D Models, Textures, Sounds, Videos and etc...
Well, the anwser is simple, they usually bundle them, they pack them close together in their eighter compressed or even encrypted form (Rarely).
To understand let's first quickly move into the basics, into how the Computer stores any file at all.

DATA TYPES

Those are the most frequent Data types:

  • Byte/Character = 1 Byte, so 8 Bits
  • Word/Short = 2 Bytes, so 16 Bits
  • Dword/Int = 4 Bytes, so 32 Bits
  • ULONG32/Long = 4 Bytes, so 32 Bits
  • ULONG64/Long Long = 8 Bytes, so 64 Bits
  • Float = 4 Byte, so 32 bits
  • Double = 8 Bytes, so 64 Bits
  • String = A sequence of 1 Byte Characters terminated with null ("00")

Where Bit is literally one of the smallest Data that we can present, it's eighter 0 or 1 but combining those 8 Bits together (Example: 0 1 1 1 0 0 1 1) so we get a whole byte.
So, all files literally look like this:

Addres:         HEX:                                                                        ASCII:
                     01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0x00000040  2a 2a 20 2a 2f 0a 09 54 61 62 6c 65 20 77 69 74    ** */..Table wit
0x00000050  68 20 54 41 42 73 20 28 30 39 29 0a 09 31 09 09   h TABs (09)..1..
0x00000060  32 09 09 33 0a 09 33 2e 31 34 09 36 2e 32 38 09   2..3..3.14.6.28.
0x00000070  39 2e 34 32 0a                                                         9.42.

This is called a Hex dump, it's essentially a mkore human readable code of binary file that aside the actual Binary data in HEX shows us the Adresses and the ASCII representation for each 0x..0 to 0x..F line.

The packed file usually contains compressed data and a small separator/padding between them, hover it doesn't tell us the name and the path of the file we want to com press, whch is a problem.
Heck, we don't even know which compression method was used and which "flavour/version" and how the decompressed file should look like... That's where QuckBMS comes to help.

QuickBMS
QuickBMS has one very specific function I wanna talk about, it's "comptype unzip_dynamic" it supports millions methods and their "flavours/versions".
It has also a very fast perfomance and is good for extracting the multiple files out of the package at once.
There are also already lots of QuckBMS scripts out there for extracting specific archives, but I'll talk about that later.

Compression types
As said previously, the block separators/markers are very usefull to identify but turns out most of the compression methods have their own headers and magic numbers, here are few of them:

Magic numbers:
ZLIB:
78 01 (NoComp)
78 5E (Fastest)
78 9C (Default)
78 DA (Maximum)
LZ4:
[No Magic Numbers]
LZ4 Frame:
04 22 4D 18 (Default)
LZW:
[No Magic Numbers]
LZO:
[No Magic Numbers]
BZIP/BZIP2:
42 5A 68
GZIP:
1F 8B 08

Practical steps

Below is the example of how average QuickBMS Archive extractor looks like, it's also one of my first favourites and the first one, it's designed for extraction of assets from Wolfenstein: The New Order & Wolfenstein: The Old Blood:

wolfenstein.bms:

open FDDE index 0
open FDDE resources 1
comtype unzip_dynamic

endian big
goto 0x24
get files long
get unk long
math TMP = files
math TMP - 1
for i = 0 < files
endian little
get FNsize1 long
getdstring FN1 FNsize1
get FNsize2 long
getdstring FN2 FNsize2
get namesize long
getdstring name namesize
endian big
get offset long
get size long
get zsize long
get unksize long
math unksize * 0x18
math unksize + 5
getdstring unkdata unksize

if size = zsize
log name offset size 1
else 
clog name offset zsize size 1
endif
if i != TMP
get filenumber long
endif
next i

Script summary:

  1. Open index + data files
  2. Read file count
  3. For each file:
  4. Read filename
  5. Read offset and sizes
  6. Skip metadata
  7. Extract raw or compressed data
  8. Compression logic
  9. ConditionAction

| size == zsizeRaw copy |   | size != zsizeZlib decompress |

In ID Tech 5, games have all assets packed inside the .resources files and all of the metadata like name, path, extension, compressed size, decompressed size are in the .index files that means is that we open both files in this way:
1. Setup

open FDDE index 0
open FDDE resources 1


open = tells QuickBMS to open files.
FDDE = the format ID (arbitrary, just a label).
index -> is opened as file 0
resources -> is opened as file 1

comtype unzip_dynamic

2. Loop
This sets compression to Zlib dynamic (handles soo many decompressions).

endian big

This sets reading data in Big-Endian.

goto 0x24

Jumps to 0x24 to skip archive header

get files long
get unk long

files -> total number of files in the archive
unk -> unknown value (probably versioning or flags)

math TMP = files
math TMP - 1

TMP = holds files - 1
(Used later to avoid reading an extra value after the last entry).

for i = 0 < files

The script now iterates once per file entry.

endian little

The filename block is little-endian, even though the rest of the archive is big-endian.

get FNsize1 long
getdstring FN1 FNsize1

Reads length of string
Reads string data

get FNsize2 long
getdstring FN2 FNsize2

Another string component

get namesize long
getdstring name namesize

This is the real filename
name is used later by log / clog
This determines the extracted file name on disk

endian big

Switch back to big-endian.

get offset long
get size long
get zsize long

offset = Where the file data is located in resources

size = Uncompressed size

zsize = Compressed size

get unksize long
math unksize * 0x18
math unksize + 5
getdstring unkdata unksize

Reads a count (unksize)
Multiplies it by 0x18 (24 bytes per entry)
Adds 5 extra bytes
Skips a block of unknown metadata
This is not used for extraction, only skipped to reach the next file entry.

3. Extract the files

if size = zsize
log name offset size 1

If compressed size equals uncompressed size:

File is stored raw
log copies data directly from file 1 (resources)

else 
clog name offset zsize size 1
endif

clog means compressed log
Reads zsize bytes
Decompresses using unzip_dynamic
Writes size bytes to disk

if i != TMP
get filenumber long
endif

Between entries, there is an extra long
Probably an ID or index value
Not present after the last entry
This prevents reading past the table

next i

Repeats until all files are extracted

Edited by user3678

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.