Members Hazza12555 Posted March 14, 2025 Members Posted March 14, 2025 i'm wanting to extract from the amazing spider-man 1 and 2, and all the data files are in pkz format (which i'm unsure how to extract) , i'm assuming the models are too. If anyone knows how to extract these, this would be great! I've uploaded some sample pkz files from the game as reference for anyone who needs. https://gofile.io/d/o11oKY
Engineers shak-otay Posted March 14, 2025 Engineers Posted March 14, 2025 If it's for PC try this. (A decompressor for pkz, no extraction of single files, so dunno what that means exactly.)
Engineers h3x3r Posted March 14, 2025 Engineers Posted March 14, 2025 Here's decompression code. It creates single file. Not sure where are valid offsets... ################################### get BaseFileName basename comtype zlib_noerror get FileMagic uint32 get DataBaseOffset uint32 get ChunkSize uint32 get Unknown_0 uint32 get Files uint32 get TotalComSize uint32 get TotalDecSize uint32 goto DataBaseOffset for i = 0 < Files savepos ChunkOffset getdstring ChunkData ChunkSize string FileName p= "%s.dec" BaseFileName append 0 clog FileName ChunkOffset ChunkSize ChunkSize next i
Members Hazza12555 Posted March 15, 2025 Author Members Posted March 15, 2025 On 3/14/2025 at 12:54 PM, shak-otay said: If it's for PC try this. (A decompressor for pkz, no extraction of single files, so dunno what that means exactly.) Tried both of the BMS scripts that were listed in here, both gave me a decomp file. I'm unsure of what to do next.
Members Hazza12555 Posted March 15, 2025 Author Members Posted March 15, 2025 edit: I used the decompressor above which gave me .decomp files with .header files which i then put into Ravioli Game Tools, this gave me multiple wem files and .dds texture files. I converted the wem files to wav and they seem to be sfx/efforts, as for the .dds textures files, some seem to be okay, other's seem to be corrupted or something like that (i'm new to model/texture extraction) Still no model files found though, if anyone could help this would be great! I linked a zip with the wav files and dds texture files for anyone wanting to look, as well with the .decomp file that i extracted them from. https://gofile.io/d/8jKyD9
Members Hazza12555 Posted March 17, 2025 Author Members Posted March 17, 2025 anybody else checked this, i tried reverse engineering but couldn't get anything...
Engineers shak-otay Posted March 17, 2025 Engineers Posted March 17, 2025 (edited) For the corrupted dds files - you'll need to find out the source of the problem: is it the bms script or is it Ravioli Game Tools? Here's an example of two 32 kB dds files, one ok, the other with defect (maybe wrong decompression size?). Edited March 17, 2025 by shak-otay
Members Hazza12555 Posted March 18, 2025 Author Members Posted March 18, 2025 On 3/17/2025 at 10:36 PM, shak-otay said: For the corrupted dds files - you'll need to find out the source of the problem: is it the bms script or is it Ravioli Game Tools? Here's an example of two 32 kB dds files, one ok, the other with defect (maybe wrong decompression size?). I have no idea why the .dds texture files are partly corrupted, it could be the decompressor but i'm unsure. However to verify it's not Ravioli Game Tools, i also used Dragon Unpacker. Ravioli Game Tools found 175, and Dragon Unpacker found 122 dds texture files. (I tried this with both decompressors in here and the results were the same) I believe the game was made by Beenox using Goliath Engine so this could be pointers to how to extract. Opening the .decomp with hxd reveals many file names i assume such as OscorpGasMask_S[Hi], [Hi] meaning HiRez texture as mentioned in the hxd too HiRezWindowsEFIGSR (I have attached the .decomp and header file along with the dds textures from both Ravioli Game Tools and Dragon Unpacker) As of now i'm unaware of any tool being able to scrape models or extract models directly, so any help on this project is appreciated! decomp basement pkz.zip
Engineers h3x3r Posted March 20, 2025 Engineers Posted March 20, 2025 I checked both single decompressed file vs individual decompressed files and they are same in size. So I believe that decompression is fine. Maybe that chunks are stacked in some order. But it's just guess... Also i noticed after decompression that some files have pointer on data which actually doesn't exists. See below 1
Members Hazza12555 Posted March 20, 2025 Author Members Posted March 20, 2025 11 hours ago, h3x3r said: I checked both single decompressed file vs individual decompressed files and they are same in size. So I believe that decompression is fine. Maybe that chunks are stacked in some order. But it's just guess... Also i noticed after decompression that some files have pointer on data which actually doesn't exists. See below damn i hope we can crack this
Engineers h3x3r Posted March 21, 2025 Engineers Posted March 21, 2025 (edited) Find out something. The last chunk is not decompressed and I don't know why... Here is struct of compressed pkz. There is also one table but not sure for what is. uint32 Sign; uint32 DataBaseOffset; uint32 ChunkSize; uint32 Unknown; uint32 Files; uint32 TotalComSize; uint32 TotalDecSize; struct { uint32 TotalDecompressedSize; }Table[Files]<optimize=false>; FSeek(DataBaseOffset); struct { struct { byte ChunkData[ChunkSize]; }Chunk[Files]<optimize=false>; }Chunks; Edited March 21, 2025 by h3x3r
Members Hazza12555 Posted March 21, 2025 Author Members Posted March 21, 2025 2 hours ago, h3x3r said: Find out something. The last chunk is not decompressed and I don't know why... Here is struct of compressed pkz. There is also one table but not sure for what is. uint32 Sign; uint32 DataBaseOffset; uint32 ChunkSize; uint32 Unknown; uint32 Files; uint32 TotalComSize; uint32 TotalDecSize; struct { uint32 TotalDecompressedSize; }Table[Files]<optimize=false>; FSeek(DataBaseOffset); struct { struct { byte ChunkData[ChunkSize]; }Chunk[Files]<optimize=false>; }Chunks; hmm this is strange, i knew Beenox Engine was meant to be tricky and the only way to get models right now is through Ninja Ripper. I really hope someone smarter than I am can look into this cause i'm unsure of where to start.
Members Hazza12555 Posted March 21, 2025 Author Members Posted March 21, 2025 if this is useful i have some other .pkz files from the ps3 version of the game, just to see if anything differ's from the pc version. If someone could check this it would be appreciated! ps3 version.zip
Members Hazza12555 Posted May 1, 2025 Author Members Posted May 1, 2025 Been digging around this topic and found that chrrox's QuickBMS script for Spider-Man Edge of Time https://web.archive.org/web/20220310025845/https://forum.xentax.com/viewtopic.php?f=10&t=7828& also works for the .pkz files in TASM 1 This output's a .ext decompressed file I then found that id-daemon has an exe that works with chrrox's script to extract the contents from the Edge of Time .ext files https://web.archive.org/web/20230429100932fw_/https://zenhax.com/viewtopic.php?t=4379 However with TASM 1, the exe does nothing. If anyone could figure out how to extract the contents of the new .ext files using id-daemon's .ext extractor then this topic can finally be closed! I'll link a zip file containing a .ext example and the id-daemon exe https://gofile.io/d/NBDC54
Members KuWuniss6 Posted May 23 Members Posted May 23 The model has roughly 6,000 to 7,000 floating-point vertices. It doesn't seem difficult to process. The data starts right after the sequence of eight AA bytes.
Members Solution Hazza12555 Posted yesterday at 01:56 AM Author Members Solution Posted yesterday at 01:56 AM (edited) TASM-TOOLS V1 The data examined is the **PS3 build** (tag *"HiRez PS3 EU"*, RSX GS version 6.66.5109, built May 2012). Everything is **big-endian**. The same structures appear in *Edge of Time*, but TASM uses a newer geometry encoding that older rippers (e.g. id-daemon's `SpidermanEoT.exe`) mis-parse. ## 1. Container: `.pkz` → `.ext` A `.pkz` is a zlib-chunked blob with a 28-byte big-endian header: | Offset | Type | Field | |-------:|------|-------| | 0 | u32 | magic `BABEB1B0` | | 4 | u32 | chunk size (always `0x8000`) | | 8 | u32 | base offset of first chunk (`0x8000`) | | 12 | u32 | unknown / flags | | 16 | u32 | chunk count | | 20 | u32 | total compressed size | | 24 | u32 | total decompressed size | After the header is a table of `chunkCount` u32s = the **cumulative** decompressed size at the end of each chunk. The payload is a sequence of independent zlib streams, each occupying a `0x8000`-byte slot in the file. Decompress each, and **truncate every chunk's output to its table-derived size** (the last block of a stream is padded; the cumulative table is the source of truth). ## 2. Chunk tree (the `.ext`) The `.ext` is a tree of typed chunks. Every chunk has a **12-byte big-endian header**: ``` u32 type u16 a u16 b u32 payloadSize ``` followed by `payloadSize` bytes of payload, which may itself contain child chunks. Parse recursively by walking `pos += 12 + payloadSize`. The root chunk's `payloadSize` bounds the whole file. Chunk types referenced in this guide: | Type | Meaning | |------|---------| | `0x0005` | Skeleton container (one `0x138D` wrapper per skinned model) | | `0x0026` | Named asset blob (model, texture, …). First child `0x138E` carries the name | | `0x0195` | GCM texture (register dump + pixels), child of a `0x0026` texture asset | | `0x0322` | Material-group → submesh map | | `0x0324` | Material block (contains a `0x0331` material + `0x0338` render-state) | | `0x0325` | LOD block — either a **descriptor** block (`0x0327`s) or a **geometry** block (`0x1388`/`0x1389`) | | `0x0326` | LOD header: `[u32 numDescriptors][…][u32 lodIndex]…` | | `0x0327` | Submesh descriptor (vertex format, packet records, bone-slot palette) | | `0x0331` | Material (lists its textures' full source paths) | | `0x0338` | Per-submesh RSX render-state (GCM push-buffer fragment) | | **`0x033A`** | **Skinning palette: bone → matrix-slot inverse table** (the key to rigging) | | `0x0384` | Skeleton data (names, local TRS, bind matrices) | | `0x0385` | Bone name-hash table + bind matrices | | `0x0389` | Per-bone local bind TRS | | `0x038B` | Bone name string table | | `0x032E` | RSX GPU command/push buffer (not needed for extraction) | | `0x1388` | Vertex buffer | | `0x1389` | Index buffer | | `0x138E` | Name header (asset name at +24/+36) | A model is a `0x0026` asset whose children include `0x0325` blocks. Within a model the `0x0325` blocks pair up: **descriptor** blocks (holding `0x0327` submesh descriptors) and **geometry** blocks (holding one `0x1388` vertex buffer + one `0x1389` index buffer). Index 0 of each list is **LOD0** (the highest detail; only LOD0 is exported). ## 3. Geometry (models) ### 3.1 Submesh descriptor — `0x0327` * `header[2]` = **palette length** (`P`). The palette is the **last `P` u32s** of the chunk. * `header[6]` = **vertex-format flags** * Each **packet record** is `[vtxOff, nVerts, vtxBytes, idxOff, idxBytes<<16 | idxCount, baseVtx, 0, 0]`. A submesh is split into packets because the PS3 **SPU skinning** path processes a bounded batch of vertices (and a bounded bone palette) at a time. * The **palette** maps a packet's *local* bone indices to **skinning slots** — *not* bone ids ### 3.2 Vertex-format flags (`header[6]`) | Bit | Meaning | |----:|---------| | `0x0004` | has tangent stream + a padded u8 stream | | `0x0010` | has vertex colour | | `0x1000` | **skinned** (has bone indices + weights) | | bits 5–6 | UV-set count | ### 3.3 Vertex buffer layout — **Structure-of-Arrays in quads of 4** Within a packet the streams are **not** interleaved per vertex. They are stored **stream-by-stream**, and each stream is itself tiled in **quads of 4 vertices** (the SPU processes 4 lanes at a time). Stream order: ``` position (3×f32) ─ nVerts×12 bytes skin (4×u8 idx + 4×u8 wgt) ─ nVerts×8 (only if skinned) normal (packed u32) ─ nVerts×4 tangent (packed u32) ─ nVerts×4 (only if flag 0x4) + pad16 of a u8 stream colour (4×u8) ─ nVerts×4 (only if flag 0x10) uv0 (2×s16) ─ nVerts×4 uv1 (2×s16) ─ nVerts×4 (per extra UV set) ``` Within the **skin** stream, the quad tiling means the 8 bytes for 4 verts are laid out **influence-major**: `idx0[v0..v3], idx1[v0..v3], idx2[v0..v3], idx3[v0..v3]` then the same for weights. So for vertex `v` in quad `q`, the four bone influences are bytes `idx[v], idx[4+v], idx[8+v], idx[12+v]` of that quad's 16-byte index block (weights likewise). UVs are **s16 / 1024**. ### 3.4 UVs These textures use a **top-left (V-down) origin**, the same as glTF — so UVs are used **as-is** for glTF. (OBJ is bottom-left, so the OBJ exporter flips `v → 1-v`; copying that flip into the glTF path "splatters" the atlas — an eye UV island lands on a hand, etc.) ### 3.6 Normals The packed u32 vertex normals I couldn't decode (4×s8 / 10_10_10_2 / b8 all correlate ≤ 0.53 with geometry). I instead computed **area-weighted smooth normals from geometry**, welded across coincident positions (`np.unique(round(P,4))`) so shading stays smooth across UV seams. Winding is inconsistent across submeshes, so materials are written **`doubleSided: true`** rather than trying to fix face orientation — this also removes the "see-through / inside-out" look. (this is the part that needs improving, so if anybody is good with UV's feel free!) ## 4. Skeleton — `0x0005` → `0x138D` → `0x0384` One `0x138D` wrapper per skinned model; its `0x138E` name equals the model name. Inside the `0x0384`: | Child | Contents | |-------|----------| | `0x038B` | Bone **name** string table (NUL-separated; entry 0 = skeleton root, e.g. `Char_Rhino`) | | `0x0389` | Per-bone **local bind TRS**: 10×f32 = translate[3], quat[4] (xyzw), scale[3] | | `0x0385` | `u32[2] = (boneCount N, 4)`, then an `N`-entry name-hash table, then in the **tail** `N` records of **148 bytes** each | | `0x033A` | (model-level, see §5) skinning-slot palette | Each **148-byte** `0x0385` record (37 f32) = `[ local bind mat4 | inverse-bind mat4 | 5 f32 ]`. The record base is `payloadEnd − N×148`. Matrices are **row-major / row-vector** (`worldᵢ = localᵢ · worldₚₐᵣₑₙₜ`). `0x038B`, `0x0389` and `0x0385` are all in the **same bone order** (verified by matching local translations). The `0x0385` name-hash table is `N` sorted `(nameHash, boneIdx)` pairs for runtime name lookup — it is **not** a skinning order. ### 4.1 Recovering the parent hierarchy The file doesn't store parent indices explicitly — they are recovered exactly from the matrices: ``` worldᵢ = inverse(invBindᵢ) parentWorldᵢ = inverse(localᵢ) · worldᵢ (row-vector convention) parent(i) = the j whose worldⱼ == parentWorldᵢ (entry 0 is the root) ``` Verified bind-pose round-trip `world · invBind == I` to ~1e-6 for every skeleton (Rhino 98, Lizard 107, RobotMaker 88, Iguana/Vermin 112, Spider-Man 100, …). ## 5. Rigging — the two-hop skin resolve (`0x033A`) **This was the hard part.** A per-vertex skin index is **not** a bone id, and it is **not** a direct index into the descriptor's palette. It resolves in **two hops**: ``` per-vertex skin index │ (0x0327 descriptor palette — per submesh) ▼ skinning SLOT │ (0x033A model palette) ▼ skeleton BONE ``` * **Hop 1** — the descriptor's tail palette turns a *packet-local* index into a **slot** in the model's GPU skinning-matrix palette (the order in which bone matrices are uploaded for a draw). A packet only references a small window of slots, which is why the index is local and small. * **Hop 2** — the **`0x033A`** chunk (model-level, sibling of `0x0324`/`0x0325`) maps slots to skeleton bones. It is stored as an **inverse table**: ``` u8 boneCount (== skeleton N) u8 slotCount (== number of skinned bones) boneCount bytes: inv[bone] = slot (0xFF = bone is not skinned) 0xAA padding ``` Invert it to get `{slot → bone}`. The `boneCount`/`slotCount` header self-checksums against the skeleton, so the alignment is unambiguous. **Why it matters:** skipping `0x033A` and using the descriptor palette value directly as a bone id binds, e.g., Rhino's head vertices to a *calf twist bone* — "move the foot bone and the knee moves". With the full two-hop resolve, every region binds correctly: head→`Bone_Head`, feet→toe bones, hands→finger bones, across the whole roster. > Note: the slot ordering in `0x033A` is an authored, hierarchical grouping (Hips, left > leg, right leg, spine, head, left arm, right arm, …) and deliberately **skips** some > bones (e.g. `Bone_Thigh_L` whose deformation is handled by its twist bones). A geometric > "nearest-bone" reconstruction gets ~95% of this right but mis-binds twist/helper bones > and mechanical rigs — `0x033A` is the exact authored data, so we use it. Per-vertex **weights** are u8 (0–255); normalise by their sum. ## 6. Textures — GCM `0x0195` → DDS A texture is a `0x0026` asset whose single `0x0195` child is a **dump of the PS3 RSX (GCM) texture-setup registers** followed by raw compressed pixels. ### 6.1 Header (big-endian) ``` … [u32 dataSize] [u8 fmt] [u8 mipCount] 0x02 0x00|0x01 … 0xAAE4 [u16 W] [u16 H] … ``` * `fmt`: `0x86`=DXT1(BC1), `0x87`=DXT3(BC2), `0x88`=DXT5(BC3), `0x85`/`0xA5`=A8R8G8B8. * `0x02` = 2D texture. The `0xAAE4` register marker is immediately followed by the **full** W and H. * Pixels are **linear** (compressed textures are never RSX-swizzled) and already in PC byte order — they drop straight into a DDS with no swap/deswizzle. Which mip levels are actually present **varies** per texture (full chain / top `[Hi]` mips only / a lower band — the rest streams from elsewhere). The tool scans start-level × run-length and accepts the run that leaves a sane (~120–160 byte) register header, preferring the largest image. ### 6.2 Rebuilding the DDS Wrap the pixels in a standard 128-byte DDS header: **4-byte magic + 124-byte `DDS_HEADER`**. The struct ends with `caps, caps2, caps3, caps4, reserved2` — **five** trailing u32s. An early version omitted `reserved2`, producing a 124-byte file header that mis-aligned every DDS by 4 bytes (corrupt in viewers/Blender). Pixel format is `FourCC` (`DXT1/3/5`) or an `A8R8G8B8` mask for uncompressed. ### 6.3 Normal maps (DXT5nm) `_N` normal maps look orange because they're **DXT5nm-packed**: X in the **alpha** channel, Y in **green**, Z reconstructed in-shader. ### 6.4 Material → texture binding Each `0x0331` material lists the **full source paths** of its textures (e.g. `…\Rhino_D[Hi].dds`), so binding is exact — no name guessing. Diffuse `_D`/`_DA` → `baseColorTexture`, normal `_N` → `normalTexture`, specular `_S` → `KHR_materials_specular.specularColorTexture`. **Submesh → material gotcha (no clean index exists — this is a heuristic):** a model often has **more** `0x0324` material blocks than submeshes. There is no decoded per-submesh material id in the file (`0x0322` only groups material *blocks* by a shared hash; the `0x0327` `u32[1]` "group key" is a coarse render-sort class, not a material id; the `0x032E` push buffer holds the real draw→texture order but its texture references are GPU addresses, not asset ids). So binding is reconstructed two ways, chosen by whether the model has **VFX/rim passes**: * **VFX present (e.g. Rhino, Iguana):** real surface materials are interleaved with rim passes (named `VFXMat_*`, carrying only effect textures, no `_D`), which shifts block indices so an index-based map mis-binds (e.g. puts the chain texture on the head/feet). Here the `0x0327` material-group key (`u32[1]`) is used: submeshes sharing a key share a material, and the Nth distinct group (in submesh order) maps to the Nth base material (one with a `_D`/`_DA` diffuse), so chains→`Handcuff`, eyes→`*_Eyes`, body→the skin atlas. This holds even when there are more base materials than groups (Iguana ships an unused `IguanaScars`). * **No VFX (e.g. RobotMaker/Smythe):** all material blocks are real and several share one skin atlas (`Head`/`Eyes`/`Dents` → one `_D`, `Body`/`Bracer`/`shoes` → another). Here the group key does **not** track the texture (head and body submeshes share a key), so instead each submesh is bound to the nearest base material at-or-before its index (skipping the no-`_D` passes). Since the shared atlases make adjacent materials texturally equivalent, this lands the right diffuse. Tools/Instructions: 1. Use unpkz.py on a .pkz file. USAGE: python unpkz.py BossRhino.pkz BossAlistair.pkz (multiple at once) It prints the header fields and confirms the decompressed size matches the size declared in the archive. 2. (OPTIONAL). extract_ext.py produces a plain per-model OBJ (LOD0, UVs), mainly used for a quick geometry check. USAGE: python extract_ext.py BossRhino.ext (All Models) python extract_ext.py BossRhino.ext Rhino (only models whose name contains "Rhino") 3. export_gltf.py produces a rigged + textured glb (skeleton, hierarchy, inverse-bind, JOINTS/WEIGHTS via the `0x033A` resolve, embedded diffuse/normal/specular PNG) *Main Script* USAGE: python export_gltf.py BossRhino.ext python export_gltf.py BossAlistair.ext Lizard (name filter) 4. (OPTIONAL) extract_textures.py per-texture dds + png (GCM decode, DXT) USAGE: python extract_textures.py BossRhino.ext Requirements: Python 3 + numpy + pillow (pillow only needed for the textures; without it you still get the rigged mesh, just untextured) FEEL FREE TO TEST IT AND GIVE ME FEEDBACK, I'M ALREADY WORKING ON BUGS/FIXES Edited 18 hours ago by Hazza12555
Members Hazza12555 Posted 19 hours ago Author Members Posted 19 hours ago Whilst on the topic, I thought i'd decode the audio too (Streams2.dat) Everything is **big-endian**, like the rest of the PS3 build. ### 8.1 Container layout | Region | Contents | |--------|----------| | `0x000000 .. 0x0B0000` | **TOC** — a serialized Beenox asset DB (binary index; a few UTF-16-BE subtitle strings near the end, e.g. `//ALISTAIRSMYTHE//We don't have the time…`) | | `0x0B0000 .. EOF` | **audio payload** — the 9645 `.wem` streams | Each `.wem` starts on a **64 KiB boundary** and is followed by `0xBA` filler up to the next boundary. A stream's real length is its RIFF size field (`u32` at `RIFX`+4) **+ 8**. The codec is Wwise "Custom Vorbis" (`fmt` tag `0xFFFF`); **vgmstream** decodes it natively (no external codebooks needed). ### 8.2 TOC record — id ↔ stream The payload is **not** addressed by a plain offset table. Each stream has a **32-byte** record in the TOC (some carry an extra tagged property, e.g. key `0x11FC`, which lengthens the record): ``` u32 id ─ 32-bit Wwise short-id (the canonical .wem filename) u32 0 u32 a ─ small param (packet/seek related; not needed) u32 b ─ param (not needed) u32 size ─ RIFX total size = riff_size + 8 u32 0 u32 0 [+ extra tagged property words] u32 end_offset ─ absolute file offset of (block_start + align4(size)) ``` The key field is **`end_offset`**: it equals `block_start + align4(size)`, so it is a **physical pointer** that ties each record to exactly one payload block. Records are stored **sorted by id, in the same order as the payload blocks** — but several sound-banks are concatenated, so the global id order has a few inversions USAGE: extract_streams2.py - .wem + streams2_manifest.csv `streams2_manifest.csv` columns: `index, file_offset, wem_size, wwise_id_dec, wwise_id_hex, filename, naming`. Play `.wem` directly with foobar2000 + the vgmstream component, extract_streams2.py
Members Hazza12555 Posted 18 hours ago Author Members Posted 18 hours ago TASM TOOLS V2 Fixed Bugs with textures not applying correctly/missing models. TASM TOOLS v2.rar
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now