Help us keep the site running.

Search the Community

Showing results for tags '1'.

Found 1 result

Sort By
- Date
- Relevancy

Glacier 1 *.STR file format

wssdude posted a topic in Tutorials

(Little preamble:) Unsure if I should post each archive format separately... I'll start with this one, as I have it best described atm. I reversed bunch of other audio-related formats of Glacier 1 games so I plan to slowly put them all here. Take this as an appetizer 😛 Streams files of Glacier 1 games can be read on their own, they contains all of the required data. It should actually be read before any scenes when someone wants to do anything audio-related to have best support, unlike older Glacier 1 games which had streams.wav. Data in file can be separated into following sections, some have clear indices some can be implicitly inferred: - header - block of WAV data (also contains LIP-encoded segments in some data, current exact structure of these is unknown...) - block of WAV headers (different format than headers in *.WHD files, is much simpler and more concise) - file name table (file names match those in *.WHD and *.SND files, there are some extras though contained within this file so this is not full subset!) - records table (start marked in header along with records count) Block of WAV data seems to be aligned on 0x100 boundary (which coincidentally seems to also be size of header and offset to block of WAV data...). Rest of the file does not seem to have any specific alignment. Any WAV data may be encoded in LIP segments which have variable length. Header of the LIP chunks seems to have size of 0xF00 or 0x1000 (with first header containing 'LIP ' magic). Each record seems to contain a field which can be checked to see if data contains LIP segments or not without the need to rely on comparing magic of each data block. For distance-based records, you will have to look into master record to see if LIP encoding is used. There may be multiple LIP segments in the data block, but only first one has magic in first four bytes. Due to variable data length, we have to find out the right size of the LIP segment first before parsing. It seems to appear roughly every ~4 seconds, but naive formula of `average byte rate * 4` just roughly yields what the LIP segment is. Therefore, there is some guessing work that has to be done on the algorithm side to extract data properly. If anyone could help with reversing these LIP segments, it would be great! They do not seem to correspond to speech necessarily. Current detection method for LIP segments relies on the fact that archive is aligned, we know roughly where the offset should be and that we can calculate exact size of each data block (next block offset - current block offset). There is also additional observation to be made that nearly all LIP segments seem to have around half of their data filled with zeroes. We can also notice that when we subtract real data size, aligned on 0x100 boundary, from whole data block size, we get amount of bytes belonging to LIP segments. We can then calculate from this size amount of LIP segments in the data block. There may be only one such segment (calculated size of all LIP segments is <= 0x1000), which does not require us to do any magic - we just have to skip past the header and read real data right after it. Note that the size of the block may be 0xF00 and not 0x1000 so <= and skipping whatever offset you get is probably best course of action until the segments are bit better understood. If there are more segments, we can proceed with calculation of segment size (as the data is interleaved in a way described above). As mentioned before, we roughly know when each of LIP segments appears in the audio file (it is roughly equivalent to 4 seconds, leaving last block unaligned most of the time with smaller size). We should try to pattern match buffer of size 0x780 filled with zeroes, masking each found offset with ~0xFFF (which will left-align on 0x1000) and taking closest offset to the one we predicted. We then read in minimum from "data block bytes left to read" and this "found LIP segment offset", skip 0x1000 bytes to get "divider offset" for the encoded block and copy each part of the segment into its own buffer. In the end, we are left with complete LIP data and complete WAV data. Block of WAV data is organized in such a way that it has all non-distance-based entries at the beginning and all distance-based entries at the end. There is no clear block of LIP data, it seems to be mixed randomly in-between all of the entries so no reliable distinction in the block. Distance-based entries point to same data offset, there are always exactly three such pointers (2 defined in *.WHD which have their copy in *.STR file also, 1 is only defined in *.STR file). There cannot be other number of "duplicates" pointing to same data offset than 1 (none) or 3 (distance-based entry, 1 for master and 2 for near/far data). Third entry we mentioned is STR file only, it is the true data definition used by the sound graph. If LIP data is present, master record has appropriate flag set. Note that master record should not really be used for other things, as its parameters are not exactly the same always and correct ones are located directly in the entry. TODO - add information about distance-based records structure, it is also interleaved... Due to all this, recommended way to get to the actual data is to pre-calculate all individual WAV data block sizes and resolve LIP segment sizes for each record along with detecting which records are distance-based. TODO - add parsing process used by Glacier 1 Audio Tool which seems to have correct export Note that format information of data, along with data sizes, offsets, names, etc. are all the same as one can find in their equivalent records in *.WHD files. So there is no need to reference *.WHD files for any sort of information for extraction of the *.STR files (unlike older Glacier 1 games). Below are simple C++ headers which should help anyone interested to get started with the file format I hope! V1 is for Hitman: Blood Money V2 is for Kane & Lynch: Dead Men and Mini Ninjas V3 is for Kane & Lynch 2: Dog Days (TODO - missing information+header!) // // Created by Andrej Redeky. // SPDX-License-Identifier: Unlicense // // Extended format information: https://reshax.com/topic/27-glacier-1-str-file-format // #pragma once enum class STR_LanguageID_v1 : uint32_t { Default = 0, English = 1, German = 2, French = 3, Spanish = 4, Italian = 5, Dutch = 6 }; struct STR_Header_v1 { char id[0xC] = {'I', 'O', 'I', 'S', 'N', 'D', 'S', 'T', 'R', 'E', 'A', 'M'}; // always "IOISNDSTREAM" uint8_t unkC[0x4]; // always seems to be a sequence 09 00 00 00 uint32_t offsetToEntryTable = 0; // points at the STR_Footer, right after string table ends uint32_t entriesCount = 0; // same as number of STR_Data entries in STR_Footer uint32_t dataBeginOffset = 0x100; // offset to beginning of data probably, but it is like this even for PC_Eng.str which does not have such size and has no data... uint8_t unk1C[0x8]; // always seems to be a sequence 00 00 00 00 01 00 00 00 STR_LanguageID_v1 languageId = STR_LanguageID_v1::Default; // specifies which language data is contained within the archive }; enum class STR_DataFormat_v1 : uint32_t { INVALID = 0x00, PCM_S16 = 0x02, IMA_ADPCM = 0x03, OGG_VORBIS = 0x04, DISTANCE_BASED_MASTER = 0x11 }; // beware that this is really 3 different headers, as there is no padding... didn't know how to name things so left it like this for now.. struct STR_DataHeader_v1 { // PCM_S16, IMA_ADPCM, OGG_VORBIS and DISTANCE_BASED_MASTER have following bytes STR_DataFormat_v1 format; // specifies how data should be read uint32_t samplesCount; // samples count uint32_t channels; // number of channels uint32_t sampleRate; // sample rate uint32_t bitsPerSample; // bits per sample // all PCM_S16, IMA_ADPCM and DISTANCE_BASED_MASTER have following bytes on top uint32_t blockAlign; // block alignment // all IMA_ADPCM have following bytes on top uint32_t samplesPerBlock; // samples per block }; struct STR_Entry_v1 { uint64_t id; // probably some ID, is less than total entries count, does not match its index uint64_t dataOffset; // offset to beginning of data, beware of the distance-based records which alias the same index! uint64_t dataSize; // data size uint64_t dataHeaderOffset; // offset to table containing header uint32_t dataHeaderSize; // size of STR_DataHeader_v1 (unused fields from the structure are left out) uint32_t unk24; // unknown number uint64_t fileNameLength; // length of filename in string table uint64_t fileNameOffset; // offset to filename in string table uint32_t hasLIP; // 0x04 when LIP data is present for current entry, 0x00 otherwise uint32_t unk3C; // unknown number uint64_t distanceBasedRecordOrder; // if 0, entry is not distance-based, otherwise denotes data order of individual records in data block (or is simply non-zero for master record) }; enum class STR_LanguageID_v2 : uint32_t { Default = 0, English = 1, German = 2, French = 3, Spanish = 4, Italian = 5, Dutch = 6 }; struct STR_Header_v2 { char id[0xC] = {'I', 'O', 'I', 'S', 'N', 'D', 'S', 'T', 'R', 'E', 'A', 'M'}; // always "IOISNDSTREAM" uint8_t unkC[0xC]; // always seems to be a sequence 09 00 00 00 XX XX YY YY 00 00 00 00 where XX XX changes with language and game and YY YY is same for a game (Kane & Lynch: Dead Man has this sequence E1 46, Mini Ninjas has this sequence 4C 4A) uint32_t offsetToEntryTable = 0; // points at the STR_Footer, right after string table ends uint32_t entriesCount = 0; // same as number of STR_Data entries in STR_Footer uint32_t dataBeginOffset = 0x100; // offset to beginning of data probably, but it is like this even for PC_Eng.str which does not have such size and has no data... uint8_t unk24[0x8]; // always seems to be a sequence 00 00 00 00 01 00 00 00 STR_LanguageID_v2 languageId = STR_LanguageID_v2::Default; // specifies which language data is contained within the archive uint8_t unk30[0x8]; // always some sequence 38 XX XX XX XX XX XX XX where XX is same for a game (Kane & Lynch: Dead Man has this sequence 00 A1 01 18 EE 90 7C, Mini Ninjas has this sequence 00 00 00 00 00 00 00) }; enum class STR_DataFormat_v2 : uint32_t { INVALID = 0x00, PCM_S16 = 0x02, IMA_ADPCM = 0x03, OGG_VORBIS = 0x04, UNKNOWN_MASTER = 0x1A }; // beware that this is really 2 different headers, as there is no padding... didn't know how to name things so left it like this for now.. struct STR_DataHeader_v2 { // PCM_S16, IMA_ADPCM, OGG_VORBIS and UNKNOWN_MASTER have following bytes STR_DataFormat_v2 format; // specifies how data should be read uint32_t samplesCount; // samples count uint32_t channels; // number of channels uint32_t sampleRate; // sample rate uint32_t bitsPerSample; // bits per sample uint32_t unk14 = 0; uint32_t unk18 = 0; uint32_t blockAlign; // block alignment // all IMA_ADPCM have following bytes on top uint32_t samplesPerBlock; // samples per block }; struct STR_Entry_v2 { uint64_t id; // probably some ID, is less than total entries count, does not match its index uint64_t dataOffset; // offset to beginning of data, beware of the distance-based records which alias the same index! uint64_t dataSize; // data size uint64_t dataHeaderOffset; // offset to table containing header uint32_t dataHeaderSize; // size of STR_DataHeader_v2 (unused fields from the structure are left out) uint32_t unk24; // unknown number uint64_t fileNameLength; // length of filename in string table uint64_t fileNameOffset; // offset to filename in string table uint32_t hasLIP; // 0x04 when LIP data is present for current entry, 0x00 otherwise uint32_t unk3C; // unknown number uint64_t unk40; // OLD INFO: if 0, entry is not distance-based, otherwise denotes data order of individual records in data block (or is simply non-zero for master record) };
- October 29, 2023
- 3 replies
- - 6
- - 1
  - and
  - (and 13 more)
    Tagged with:
    
    1
    
    and
    
    blood
    
    dead
    
    glacier
    
    hitman
    
    kane
    
    lynch
    
    men
    
    mini
    
    money
    
    ninjas
    
    pc
    
    specification
    
    str

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Categories

Categories

Categories

Categories

Product Groups

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

About Me

Glacier 1 *.STR file format

ResHax.com: Empowering Curious Minds in the World of Reverse Engineering

Useful links

Browse

Activity