Jump to content

The Amazing Spider-Man 1 and 2


Go to solution Solved by Hazza12555,

Recommended Posts

  • Members
Posted

i'm wanting to extract from the amazing spider-man 1 and 2, and all the data files are in pkz format (which i'm unsure how to extract) , i'm assuming the models are too.

If anyone knows how to extract these, this would be great!

I've uploaded some sample pkz files from the game as reference for anyone who needs.

https://gofile.io/d/o11oKY

 

  • Engineers
Posted

Here's decompression code. It creates single file. Not sure where are valid offsets...

###################################
get BaseFileName basename
comtype zlib_noerror

get FileMagic uint32
get DataBaseOffset uint32
get ChunkSize uint32
get Unknown_0 uint32
get Files uint32
get TotalComSize uint32
get TotalDecSize uint32

goto DataBaseOffset

for i = 0 < Files
	savepos ChunkOffset
	getdstring ChunkData ChunkSize 
	string FileName p= "%s.dec" BaseFileName
	append 0
	clog FileName ChunkOffset ChunkSize ChunkSize
next i

 

  • Members
Posted
On 3/14/2025 at 12:54 PM, shak-otay said:

If it's for PC try this.

(A decompressor for pkz, no extraction of single files, so dunno what that means exactly.)

Tried both of the BMS scripts that were listed in here, both gave me a decomp file. I'm unsure of what to do next.

  • Members
Posted

edit:

I used the decompressor above which gave me .decomp files with .header files which i then put into Ravioli Game Tools, this gave me multiple wem files and .dds texture files.

I converted the wem files to wav and they seem to be sfx/efforts, as for the .dds textures files, some seem to be okay, other's seem to be corrupted or something like that (i'm new to model/texture extraction)

Still no model files found though, if anyone could help this would be great!

I linked a zip with the wav files and dds texture files for anyone wanting to look, as well with the .decomp file that i extracted them from.

https://gofile.io/d/8jKyD9

  • Engineers
Posted (edited)

For the corrupted dds files - you'll need to find out the source of the problem: is it the bms script or is it Ravioli Game Tools?

Here's an example of two 32 kB dds files, one ok, the other with defect (maybe wrong decompression size?).

 

dds_comp.png

Edited by shak-otay
  • Members
Posted
On 3/17/2025 at 10:36 PM, shak-otay said:

For the corrupted dds files - you'll need to find out the source of the problem: is it the bms script or is it Ravioli Game Tools?

Here's an example of two 32 kB dds files, one ok, the other with defect (maybe wrong decompression size?).

 

dds_comp.png

I have no idea why the .dds texture files are partly corrupted, it could be the decompressor but i'm unsure. However to verify it's not Ravioli Game Tools, i also used Dragon Unpacker.

Ravioli Game Tools found 175, and Dragon Unpacker found 122 dds texture files. (I tried this with both decompressors in here and the results were the same)

I believe the game was made by Beenox using Goliath Engine so this could be pointers to how to extract.

Opening the .decomp with hxd reveals many file names i assume such as OscorpGasMask_S[Hi], [Hi] meaning HiRez texture as mentioned in the hxd too HiRezWindowsEFIGSR 

(I have attached the .decomp and header file along with the dds textures from both Ravioli Game Tools and Dragon Unpacker)

 

As of now i'm unaware of any tool being able to scrape models or extract models directly, so any help on this project is appreciated!

decomp basement pkz.zip

  • Engineers
Posted

I checked both single decompressed file vs individual decompressed files and they are same in size. So I believe that decompression is fine.

Maybe that chunks are stacked in some order. But it's just guess...

Also i noticed after decompression that some files have pointer on data which actually doesn't exists. See below

img1.thumb.jpg.f8db4e12ca515eb98e2ff34005d769d0.jpgimg0.thumb.jpg.f3308319fa2b3384e5c25f1a409c195f.jpg

  • Like 1
  • Members
Posted
11 hours ago, h3x3r said:

I checked both single decompressed file vs individual decompressed files and they are same in size. So I believe that decompression is fine.

Maybe that chunks are stacked in some order. But it's just guess...

Also i noticed after decompression that some files have pointer on data which actually doesn't exists. See below

img1.thumb.jpg.f8db4e12ca515eb98e2ff34005d769d0.jpgimg0.thumb.jpg.f3308319fa2b3384e5c25f1a409c195f.jpg

damn i hope we can crack this

  • Engineers
Posted (edited)

Find out something. The last chunk is not decompressed and I don't know why...

Here is struct of compressed pkz. There is also one table but not sure for what is.

uint32 Sign;
uint32 DataBaseOffset;
uint32 ChunkSize;
uint32 Unknown;
uint32 Files;
uint32 TotalComSize;
uint32 TotalDecSize;

struct
{
    uint32 TotalDecompressedSize;
}Table[Files]<optimize=false>;

FSeek(DataBaseOffset);
struct
{
    struct
    {
        byte ChunkData[ChunkSize];
    }Chunk[Files]<optimize=false>;
}Chunks;
Edited by h3x3r
  • Members
Posted
2 hours ago, h3x3r said:

Find out something. The last chunk is not decompressed and I don't know why...

Here is struct of compressed pkz. There is also one table but not sure for what is.

uint32 Sign;
uint32 DataBaseOffset;
uint32 ChunkSize;
uint32 Unknown;
uint32 Files;
uint32 TotalComSize;
uint32 TotalDecSize;

struct
{
    uint32 TotalDecompressedSize;
}Table[Files]<optimize=false>;

FSeek(DataBaseOffset);
struct
{
    struct
    {
        byte ChunkData[ChunkSize];
    }Chunk[Files]<optimize=false>;
}Chunks;

hmm this is strange, i knew Beenox Engine was meant to be tricky and the only way to get models right now is through Ninja Ripper. I really hope someone smarter than I am can look into this cause i'm unsure of where to start.

  • 1 month later...
  • Members
Posted

Been digging around this topic and found that chrrox's QuickBMS script for Spider-Man Edge of Time https://web.archive.org/web/20220310025845/https://forum.xentax.com/viewtopic.php?f=10&t=7828& also works for the .pkz files in TASM 1

This output's a .ext decompressed file

I then found that id-daemon has an exe that works with chrrox's script to extract the contents from the Edge of Time .ext files https://web.archive.org/web/20230429100932fw_/https://zenhax.com/viewtopic.php?t=4379

However with TASM 1, the exe does nothing.

If anyone could figure out how to extract the contents of the new .ext files using id-daemon's .ext extractor then this topic can finally be closed!

I'll link a zip file containing a .ext example and the id-daemon exe

https://gofile.io/d/NBDC54

  • 1 year later...
  • Members
Posted

The model has roughly 6,000 to 7,000 floating-point vertices. It doesn't seem difficult to process. The data starts right after the sequence of eight AA bytes.px22.thumb.png.20043691ca40b49ea4597a66f6f5701b.png

  • 3 weeks later...
  • Members
  • Solution
Posted (edited)

TASM-TOOLS V1

 

The data examined is the **PS3 build** (tag *"HiRez PS3 EU"*, RSX GS version 6.66.5109,
built May 2012). Everything is **big-endian**. The same structures appear in *Edge of
Time*, but TASM uses a newer geometry encoding that older rippers (e.g. id-daemon's
`SpidermanEoT.exe`) mis-parse.

## 1. Container: `.pkz` → `.ext`

A `.pkz` is a zlib-chunked blob with a 28-byte big-endian header:

| Offset | Type | Field |
|-------:|------|-------|
| 0  | u32 | magic `BABEB1B0` |
| 4  | u32 | chunk size (always `0x8000`) |
| 8  | u32 | base offset of first chunk (`0x8000`) |
| 12 | u32 | unknown / flags |
| 16 | u32 | chunk count |
| 20 | u32 | total compressed size |
| 24 | u32 | total decompressed size |

After the header is a table of `chunkCount` u32s = the **cumulative** decompressed size
at the end of each chunk. The payload is a sequence of independent zlib streams, each
occupying a `0x8000`-byte slot in the file. Decompress each, and **truncate every chunk's
output to its table-derived size** (the last block of a stream is padded; the cumulative
table is the source of truth).

## 2. Chunk tree (the `.ext`)

The `.ext` is a tree of typed chunks. Every chunk has a **12-byte big-endian header**:

```
u32 type   u16 a   u16 b   u32 payloadSize
```

followed by `payloadSize` bytes of payload, which may itself contain child chunks. Parse
recursively by walking `pos += 12 + payloadSize`. The root chunk's `payloadSize` bounds
the whole file.

Chunk types referenced in this guide:

| Type | Meaning |
|------|---------|
| `0x0005` | Skeleton container (one `0x138D` wrapper per skinned model) |
| `0x0026` | Named asset blob (model, texture, …). First child `0x138E` carries the name |
| `0x0195` | GCM texture (register dump + pixels), child of a `0x0026` texture asset |
| `0x0322` | Material-group → submesh map |
| `0x0324` | Material block (contains a `0x0331` material + `0x0338` render-state) |
| `0x0325` | LOD block — either a **descriptor** block (`0x0327`s) or a **geometry** block (`0x1388`/`0x1389`) |
| `0x0326` | LOD header: `[u32 numDescriptors][…][u32 lodIndex]…` |
| `0x0327` | Submesh descriptor (vertex format, packet records, bone-slot palette) |
| `0x0331` | Material (lists its textures' full source paths) |
| `0x0338` | Per-submesh RSX render-state (GCM push-buffer fragment) |
| **`0x033A`** | **Skinning palette: bone → matrix-slot inverse table** (the key to rigging) |
| `0x0384` | Skeleton data (names, local TRS, bind matrices) |
| `0x0385` | Bone name-hash table + bind matrices |
| `0x0389` | Per-bone local bind TRS |
| `0x038B` | Bone name string table |
| `0x032E` | RSX GPU command/push buffer (not needed for extraction) |
| `0x1388` | Vertex buffer | 
| `0x1389` | Index buffer |
| `0x138E` | Name header (asset name at +24/+36) |

A model is a `0x0026` asset whose children include `0x0325` blocks. Within a model the
`0x0325` blocks pair up: **descriptor** blocks (holding `0x0327` submesh descriptors) and
**geometry** blocks (holding one `0x1388` vertex buffer + one `0x1389` index buffer).
Index 0 of each list is **LOD0** (the highest detail; only LOD0 is exported).

## 3. Geometry (models)

### 3.1 Submesh descriptor — `0x0327`


* `header[2]` = **palette length** (`P`). The palette is the **last `P` u32s** of the chunk.
* `header[6]` = **vertex-format flags**
* Each **packet record** is `[vtxOff, nVerts, vtxBytes, idxOff, idxBytes<<16 | idxCount, baseVtx, 0, 0]`.
  A submesh is split into packets because the PS3 **SPU skinning** path processes a bounded
  batch of vertices (and a bounded bone palette) at a time.
* The **palette** maps a packet's *local* bone indices to **skinning slots** — *not* bone  ids

 

### 3.2 Vertex-format flags (`header[6]`)

| Bit | Meaning |
|----:|---------|
| `0x0004` | has tangent stream + a padded u8 stream |
| `0x0010` | has vertex colour |
| `0x1000` | **skinned** (has bone indices + weights) |
| bits 5–6 | UV-set count |

 

### 3.3 Vertex buffer layout — **Structure-of-Arrays in quads of 4**

 

Within a packet the streams are **not** interleaved per vertex. They are stored **stream-by-stream**, and each stream is itself
tiled in **quads of 4 vertices** (the SPU processes 4 lanes at a time). Stream order:

```
position (3×f32)             ─ nVerts×12 bytes
skin (4×u8 idx + 4×u8 wgt)   ─ nVerts×8   (only if skinned)
normal (packed u32)          ─ nVerts×4
tangent (packed u32)         ─ nVerts×4   (only if flag 0x4) + pad16 of a u8 stream
colour (4×u8)                ─ nVerts×4   (only if flag 0x10)
uv0 (2×s16)                  ─ nVerts×4
uv1 (2×s16)                  ─ nVerts×4   (per extra UV set)
```

Within the **skin** stream, the quad tiling means the 8 bytes for 4 verts are laid out
**influence-major**: `idx0[v0..v3], idx1[v0..v3], idx2[v0..v3], idx3[v0..v3]` then the same
for weights. So for vertex `v` in quad `q`, the four bone influences are bytes
`idx[v], idx[4+v], idx[8+v], idx[12+v]` of that quad's 16-byte index block (weights
likewise). UVs are **s16 / 1024**.

 

### 3.4 UVs

These textures use a **top-left (V-down) origin**, the same as glTF — so UVs are used
**as-is** for glTF. (OBJ is bottom-left, so the OBJ exporter flips `v → 1-v`; copying that
flip into the glTF path "splatters" the atlas — an eye UV island lands on a hand, etc.)

 

### 3.6 Normals

The packed u32 vertex normals I couldn't decode (4×s8 / 10_10_10_2 / b8 all
correlate ≤ 0.53 with geometry). I instead computed **area-weighted smooth normals from
geometry**, welded across coincident positions (`np.unique(round(P,4))`) so shading stays
smooth across UV seams. Winding is inconsistent across submeshes, so materials are written
**`doubleSided: true`** rather than trying to fix face orientation — this also removes the
"see-through / inside-out" look.

(this is the part that needs improving, so if anybody is good with UV's feel free!)
 

## 4. Skeleton — `0x0005` → `0x138D` → `0x0384`

One `0x138D` wrapper per skinned model; its `0x138E` name equals the model name. Inside
the `0x0384`:

| Child | Contents |
|-------|----------|
| `0x038B` | Bone **name** string table (NUL-separated; entry 0 = skeleton root, e.g. `Char_Rhino`) |
| `0x0389` | Per-bone **local bind TRS**: 10×f32 = translate[3], quat[4] (xyzw), scale[3] |
| `0x0385` | `u32[2] = (boneCount N, 4)`, then an `N`-entry name-hash table, then in the **tail** `N` records of **148 bytes** each |
| `0x033A` | (model-level, see §5) skinning-slot palette |

Each **148-byte** `0x0385` record (37 f32) = `[ local bind mat4 | inverse-bind mat4 | 5 f32 ]`.
The record base is `payloadEnd − N×148`. Matrices are **row-major / row-vector**
(`worldᵢ = localᵢ · worldₚₐᵣₑₙₜ`).

`0x038B`, `0x0389` and `0x0385` are all in the **same bone order** (verified by matching
local translations). The `0x0385` name-hash table is `N` sorted `(nameHash, boneIdx)`
pairs for runtime name lookup — it is **not** a skinning order.

 

### 4.1 Recovering the parent hierarchy

 

The file doesn't store parent indices explicitly — they are recovered exactly from the
matrices:

```
worldᵢ        = inverse(invBindᵢ)
parentWorldᵢ  = inverse(localᵢ) · worldᵢ        (row-vector convention)
parent(i)     = the j whose worldⱼ == parentWorldᵢ   (entry 0 is the root)
```

Verified bind-pose round-trip `world · invBind == I` to ~1e-6 for every skeleton
(Rhino 98, Lizard 107, RobotMaker 88, Iguana/Vermin 112, Spider-Man 100, …).

 


## 5. Rigging — the two-hop skin resolve (`0x033A`)

 

**This was the hard part.** A per-vertex skin index is **not** a bone id, and it is **not**
a direct index into the descriptor's palette. It resolves in **two hops**:

```
per-vertex skin index
      │  (0x0327 descriptor palette — per submesh)
      ▼
   skinning SLOT
      │  (0x033A model palette)
      ▼
   skeleton BONE
```

* **Hop 1** — the descriptor's tail palette turns a *packet-local* index into a
  **slot** in the model's GPU skinning-matrix palette (the order in which bone matrices
  are uploaded for a draw). A packet only references a small window of slots, which is why
  the index is local and small.
* **Hop 2** — the **`0x033A`** chunk (model-level, sibling of `0x0324`/`0x0325`) maps slots
  to skeleton bones. It is stored as an **inverse table**:

  ```
  u8 boneCount          (== skeleton N)
  u8 slotCount          (== number of skinned bones)
  boneCount bytes:  inv[bone] = slot         (0xFF = bone is not skinned)
  0xAA padding
  ```

  Invert it to get `{slot → bone}`. The `boneCount`/`slotCount` header self-checksums
  against the skeleton, so the alignment is unambiguous.

**Why it matters:** skipping `0x033A` and using the descriptor palette value directly as a
bone id binds, e.g., Rhino's head vertices to a *calf twist bone* — "move the foot bone and
the knee moves". With the full two-hop resolve, every region binds correctly: head→`Bone_Head`,
feet→toe bones, hands→finger bones, across the whole roster.

> Note: the slot ordering in `0x033A` is an authored, hierarchical grouping (Hips, left
> leg, right leg, spine, head, left arm, right arm, …) and deliberately **skips** some
> bones (e.g. `Bone_Thigh_L` whose deformation is handled by its twist bones). A geometric
> "nearest-bone" reconstruction gets ~95% of this right but mis-binds twist/helper bones
> and mechanical rigs — `0x033A` is the exact authored data, so we use it.

Per-vertex **weights** are u8 (0–255); normalise by their sum.

 

## 6. Textures — GCM `0x0195` → DDS

 

A texture is a `0x0026` asset whose single `0x0195` child is a **dump of the PS3 RSX (GCM)
texture-setup registers** followed by raw compressed pixels.

### 6.1 Header (big-endian)

```
… [u32 dataSize] [u8 fmt] [u8 mipCount] 0x02 0x00|0x01 … 0xAAE4 [u16 W] [u16 H] …
```

* `fmt`: `0x86`=DXT1(BC1), `0x87`=DXT3(BC2), `0x88`=DXT5(BC3), `0x85`/`0xA5`=A8R8G8B8.
* `0x02` = 2D texture. The `0xAAE4` register marker is immediately followed by the **full**
  W and H.
* Pixels are **linear** (compressed textures are never RSX-swizzled) and already in PC byte
  order — they drop straight into a DDS with no swap/deswizzle.

Which mip levels are actually present **varies** per texture (full chain / top `[Hi]` mips
only / a lower band — the rest streams from elsewhere). The tool scans start-level ×
run-length and accepts the run that leaves a sane (~120–160 byte) register header,
preferring the largest image.

 

### 6.2 Rebuilding the DDS

 

Wrap the pixels in a standard 128-byte DDS header: **4-byte magic + 124-byte `DDS_HEADER`**.
The struct ends with `caps, caps2, caps3, caps4, reserved2` — **five** trailing u32s. An
early version omitted `reserved2`, producing a 124-byte file header that mis-aligned every
DDS by 4 bytes (corrupt in viewers/Blender). Pixel format is `FourCC` (`DXT1/3/5`) or an
`A8R8G8B8` mask for uncompressed.

 

### 6.3 Normal maps (DXT5nm)

 

`_N` normal maps look orange because they're **DXT5nm-packed**: X in the **alpha** channel,
Y in **green**, Z reconstructed in-shader.

 

### 6.4 Material → texture binding

 

Each `0x0331` material lists the **full source paths** of its textures
(e.g. `…\Rhino_D[Hi].dds`), so binding is exact — no name guessing. Diffuse `_D`/`_DA` →
`baseColorTexture`, normal `_N` → `normalTexture`, specular `_S` →
`KHR_materials_specular.specularColorTexture`.

**Submesh → material gotcha (no clean index exists — this is a heuristic):** a model often
has **more** `0x0324` material blocks than submeshes. There is no decoded per-submesh
material id in the file (`0x0322` only groups material *blocks* by a shared hash; the
`0x0327` `u32[1]` "group key" is a coarse render-sort class, not a material id; the `0x032E`
push buffer holds the real draw→texture order but its texture references are GPU addresses,
not asset ids). So binding is reconstructed two ways, chosen by whether the model has
**VFX/rim passes**:

* **VFX present (e.g. Rhino, Iguana):** real surface materials are interleaved with rim
  passes (named `VFXMat_*`, carrying only effect textures, no `_D`), which shifts block
  indices so an index-based map mis-binds (e.g. puts the chain texture on the head/feet).
  Here the `0x0327` material-group key (`u32[1]`) is used: submeshes sharing a key share a
  material, and the Nth distinct group (in submesh order) maps to the Nth base material (one
  with a `_D`/`_DA` diffuse), so chains→`Handcuff`, eyes→`*_Eyes`, body→the skin atlas. This
  holds even when there are more base materials than groups (Iguana ships an unused
  `IguanaScars`).
* **No VFX (e.g. RobotMaker/Smythe):** all material blocks are real and several share one
  skin atlas (`Head`/`Eyes`/`Dents` → one `_D`, `Body`/`Bracer`/`shoes` → another). Here the
  group key does **not** track the texture (head and body submeshes share a key), so instead
  each submesh is bound to the nearest base material at-or-before its index (skipping the
  no-`_D` passes). Since the shared atlases make adjacent materials texturally equivalent,
  this lands the right diffuse.
 

Tools/Instructions:

 

1. Use unpkz.py on a .pkz file.

USAGE: python unpkz.py BossRhino.pkz BossAlistair.pkz      (multiple at once)

It prints the header fields and confirms the decompressed size matches the size declared in the archive.

2. (OPTIONAL). extract_ext.py produces a plain per-model OBJ (LOD0, UVs), mainly used for a quick geometry check.

USAGE: python extract_ext.py BossRhino.ext (All Models)
             python extract_ext.py BossRhino.ext Rhino (only models whose name contains "Rhino")

3. export_gltf.py produces a rigged + textured glb (skeleton, hierarchy, inverse-bind, JOINTS/WEIGHTS via the `0x033A` resolve, embedded diffuse/normal/specular PNG) *Main Script*

USAGE: python export_gltf.py BossRhino.ext
             python export_gltf.py BossAlistair.ext Lizard (name filter)

4. (OPTIONAL) extract_textures.py per-texture dds + png (GCM decode, DXT)

USAGE: python extract_textures.py BossRhino.ext

Requirements: Python 3 + numpy + pillow  (pillow only needed for the textures; without it you still get the rigged mesh, just untextured)

 

FEEL FREE TO TEST IT AND GIVE ME FEEDBACK, I'M ALREADY WORKING ON BUGS/FIXES

 

 

 

 

 

 

 

 

Edited by Hazza12555
  • Members
Posted

Whilst on the topic, I thought i'd decode the audio too (Streams2.dat)

Everything is **big-endian**, like the rest of the PS3 build.

### 8.1 Container layout

 

| Region | Contents |
|--------|----------|
| `0x000000 .. 0x0B0000` | **TOC** — a serialized Beenox asset DB (binary index; a few UTF-16-BE subtitle strings near the end, e.g. `//ALISTAIRSMYTHE//We don't have the time…`) |
| `0x0B0000 .. EOF` | **audio payload** — the 9645 `.wem` streams |

Each `.wem` starts on a **64 KiB boundary** and is followed by `0xBA` filler up to the next
boundary. A stream's real length is its RIFF size field (`u32` at `RIFX`+4) **+ 8**. The
codec is Wwise "Custom Vorbis" (`fmt` tag `0xFFFF`); **vgmstream** decodes it natively (no
external codebooks needed).

 

### 8.2 TOC record — id ↔ stream

 

The payload is **not** addressed by a plain offset table. Each stream has a **32-byte**
record in the TOC (some carry an extra tagged property, e.g. key `0x11FC`, which lengthens
the record):

```
u32 id            ─ 32-bit Wwise short-id  (the canonical .wem filename)
u32 0
u32 a             ─ small param (packet/seek related; not needed)
u32 b             ─ param (not needed)
u32 size          ─ RIFX total size = riff_size + 8
u32 0
u32 0  [+ extra tagged property words]
u32 end_offset    ─ absolute file offset of (block_start + align4(size))
```

The key field is **`end_offset`**: it equals `block_start + align4(size)`, so it is a
**physical pointer** that ties each record to exactly one payload block. Records are stored
**sorted by id, in the same order as the payload blocks** — but several sound-banks are
concatenated, so the global id order has a few inversions

 

USAGE: extract_streams2.py -  .wem + streams2_manifest.csv

`streams2_manifest.csv` columns: `index, file_offset, wem_size, wwise_id_dec,
wwise_id_hex, filename, naming`.

Play `.wem` directly with foobar2000 + the vgmstream component,

 

extract_streams2.py

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...