New World .datasheet file format

April 11, 20251 yr

Localization

This topic was archived here:

https://web.archive.org/web/20230000000000fw_/https://www.zenhax.com/viewtopic.php?t=15506

April 11, 20251 yr

Author
Localization

Samael, posted Fri Jul 09, 2021 2:42 pm (65122)

Hi !

I have troubles understanding these files containing tables of data. So far I understood this :

Code:

@offset 0x18   4B   data size
@offset 0x38   4B   header size
@offset 0x44   4B   number of columns
@offset 0x48   4B   number of lines
@offset 0x5c   start of the header data

The header data contains offsets relative to the beginning of the data block (@offset 0x3c header size).

Code:

4B   some kind of ID
4B   offset of what I believe is the name of the table

Then the columns' title are represented on 12B as follow :
4B   1 or 2, I believe that 2 means that the column is not used
4B   some kind of ID
4B   offset of the end of text, the beginning is the previous offset

The last entry is only on 8B because there's no ID :
4B   1 or 2
4B   last offset

Directly after are the offsets of the actual cell data and that's what I don't understand.
On a very easy file where all the columns are used (1) the data is indicated as follows :
4B start offset
4B end offset

But things get more complicated when some columns are unset. I give you some of the smallest files if you're interested in making sense of these.

https://mega.nz/file/4XxijCTJ#za049-ZPhni5UHjuJrAbFob3eiDWPeyNCKpl0X4pI7M

April 11, 20251 yr

Author
Localization

togogo, posted Tue Jul 20, 2021 8:35 am (65268)

I'm not sure if you've made progress since then but I have found that headers can also be marked 3.

Looking through the cell data, a 1 in the column means that the data is found by the offsets in the data tables like you said. But from what I've found a 2 represents a float in that cell and a 3 represents an integer in that cell.

April 11, 20251 yr

Author
Localization

badmp3, posted Tue Jul 20, 2021 11:12 am (65271)

Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

April 11, 20251 yr

Author
Localization

togogo, posted Tue Jul 20, 2021 10:11 pm (65283)

badmp3 wrote:

Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.

April 11, 20251 yr

Author
Localization

badmp3, posted Wed Jul 21, 2021 9:28 am (65288)

togogo wrote:

badmp3 wrote:

Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.

And the message your saying is scrambled also... why cant this be EZ like the Pak Unpacker script for quickbms?

April 11, 20251 yr

Author
Localization

newworld411, posted Wed Jul 21, 2021 6:14 pm (65291)

What editor/tool can you use to open these datasheet-files? does anybody know. Appreciate an answer!

April 11, 20251 yr

Author
Localization

togogo, posted Sat Jul 24, 2021 8:51 am (65330)

newworld411 wrote:

What editor/tool can you use to open these datasheet-files? does anybody know. Appreciate an answer!

You can use HxD to read the binaries, far as I know there are no posted scripts/tools to read these files in a readable format without writing your own.

April 11, 20251 yr

Author
Localization

togogo, posted Sat Jul 24, 2021 9:02 am (65332)

badmp3 wrote:

togogo wrote:

badmp3 wrote:

Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.

And the message your saying is scrambled also... why cant this be EZ like the Pak Unpacker script for quickbms?

Why can't this be easy? That's because its firstly a new game and secondly you can't expect everyone to give everything to you. The information in this thread is enough to get started on parsing these files with your scripts. QuickBMS is not used for reading binaries, it is for unpacking files. It's like using a screwdriver on a nail, its not the right tool. You need to learn the basics if you cannot understand how to even start.

April 11, 20251 yr

Author
Localization

Ethal, posted Sun Jul 25, 2021 6:35 pm (65354)

Well, I'm clearly bumping my head on this one as well.

Once one thing makes sense and other doesn't :twisted:

I will keep trying.

April 11, 20251 yr

Author
Localization

badmp3, posted Mon Jul 26, 2021 12:48 pm (65367)

Ethal wrote:

Well, I'm clearly bumping my head on this one as well.

Once one thing makes sense and other doesn't :twisted:

I will keep trying.

Make a similar thread on the New World subreddit, tell people we have the way to unpack but are stalled at the database stuff...

someone with a programmer skill set there would prob help out with this whole binary stuff..

April 11, 20251 yr

Author
Localization

Lord Vaako, posted Sun Aug 01, 2021 9:08 pm (65453)

I'm not sure about the floats but I think I'm close ;-)

. Can someone look at the attached examples to see if the converted data looks reasonable?

April 11, 20251 yr

Author
Localization

retriton, posted Mon Aug 02, 2021 7:38 am (65455)

@Lord Vaako that does look pretty close! Seems workable to me at least.

Does anyone have any idea what these files are: https://i.imgur.com/b4FR1oK.png

They look like database files but they are completely unreadable (unlike the datasheet files)

April 11, 20251 yr

Author
Localization

Vag, posted Mon Aug 02, 2021 12:25 pm (65460)

@Lord Vaako, you should be close. Row 29, Quest Battle Embrace. Here are 2 screenshots from 2 other databases. Maybe they can help.

April 11, 20251 yr

Author
Localization

Samael, posted Tue Aug 03, 2021 7:53 pm (65482)

It looks pretty close to me. How did you extract the values whose column type is 1 ? I didn't get how the two offsets works. The type 2 column is indeed a float and the value is directly accessible in the header, so there's no need to extract it.

April 11, 20251 yr

Author
Localization

Lord Vaako, posted Wed Aug 04, 2021 9:54 pm (65509)

Samael wrote:

It looks pretty close to me. How did you extract the values whose column type is 1 ? I didn't get how the two offsets works. The type 2 column is indeed a float and the value is directly accessible in the header, so there's no need to extract it.

I used just one (start), strings are separated / terminated by zeros
"strings block" starts at 0x3c header size

I start parsing header at 0x3c 0x24

cols * 12bytes (offset to column_name, column type, unknown)

then

rows * cols * 8bytes (depending on the column type, it is a float / int value or an offset)

April 11, 20251 yr

Author
Localization

Kattoor, posted Sun Aug 08, 2021 3:55 pm (65615)

I'm stuck at extracting the data from the actual cells.
Attached is a small datasheet (had to append the .txt extension for it to upload).

Headers:

Code:

72 55 23 CC 
1C 00 00 00    (FactionType)
01 00 00 00

A6 CC 1D C7
28 00 00 00    (DisplayName)
01 00 00 00

F4 1E 8B 92
34 00 00 00    (DisplayDescription)
01 00 00 00

FD 4A 83 B2
40 00 00 00    (ForegroundColorIndex)
02 00 00 00

57 E1 64 86
55 00 00 00    (ForegroundCrestIndex)
02 00 00 00

5D A5 C6 E8 
6A 00 00 00    (BackgroundColorIndex)
02 00 00 00    2

F7 0E 21 DC 
7F 00 00 00    (BackgroundCrestIndex)
02 00 00 00    2

The data starts at 0x190.
For type 1 columns I can just read 12 bytes of data.

First column, FactionType:

Code:

94 00 00 00 
94 00 00 00 
99 00 00 00

0x190 0x94 to 0x190 0x99 reads 'None'. This is correct. I'm not sure why 0x94 (the start offset) is here two times..?

Second column, DisplayName:

Code:

99 00 00 00
AA 00 00 00
AA 00 00 00

0x190 0x99 to 0x190 0xaa reads '@ui_tooltip_none'. This is also correct. I'm not sure why 0xaa (the end offset) is here two times..?

So for column 1 there was a duplicate pointer for the start offset, and now for column 2 there is a duplicate pointer for the end offset? What am I missing

Column 3 is even weirder:

Code:

AB 00 00 00
00 00 00 00 
AB 00 00 00

Why is the start offset == the end offset? Why is there a 00 00 00 00 pointer?

Now for the last 4 columns (these are type 2 columns) I have 5*4 bytes left:

Code:

00 00 00 00 
AB 00 00 00
00 00 00 00 
AB 00 00 00 
00 00 00 00

Can't really make sense of these pointers / this data..

I'm probably misinterpreting the data. Could someone help me please?

April 11, 20251 yr

Author
Localization

Sir Kane, posted Sun Aug 08, 2021 8:41 pm (65628)

Code:

#include 
#include 
#include 


enum eType : uint32_t
{
   Type_String = 1,
   Type_Float,
   Type_Bool,
};


struct SStringValue
{
   uint32_t   hash;
   uint32_t   offset;
};


union SCellValue
{
   float      floatValue;
   uint32_t   offset;
   bool      boolValue;
};

struct SCell
{
   uint32_t   stringOffset;
   SCellValue   value;
};

struct SColumnHeader
{
   SStringValue   name;
   eType         type;
};

struct STable
{
   SStringValue   outputName;
   uint32_t      columnCount;
   uint32_t      rowCount;
   SColumnHeader*   pColumnHeaders;
   SCell*         pCells;
};

struct SDataSheet
{
   static constexpr uint32_t MagicVal = 12;
   uint32_t magic;
   SStringValue   datasheetName;
   SStringValue   dataTypeName;
   uint32_t      tableCount;
   uint32_t      stringDataSize;
   uint32_t*      pTableEnds;
   byte_t*         pTables;
   char*         pStrings;
};

inline STable* GetTable(SDataSheet* pDataSheet, uint32_t index)
{
   if (index == 0)
   {
      return (STable*)pDataSheet->pTables;
   }
   else
   {
      return (STable*)(pDataSheet->pTables   pDataSheet->pTableEnds[index - 1]);
   }
}

inline const STable* GetTable(const SDataSheet* pDataSheet, uint32_t index)
{
   if (index == 0)
   {
      return (const STable*)pDataSheet->pTables;
   }
   else
   {
      return (const STable*)(pDataSheet->pTables   pDataSheet->pTableEnds[index - 1]);
   }
}

const SDataSheet* LocateDataSheet(void* pData)
{
   SDataSheet* pDataSheet = (SDataSheet*)pData;
   byte_t* pCur = (byte_t*)(pDataSheet   1);

   if (pDataSheet->magic != SDataSheet::MagicVal)
   {
      return nullptr;
   }

   pDataSheet->pTableEnds = (uint32_t*)pCur;
   if (pDataSheet->tableCount != 0)
   {
      pCur  = sizeof(uint32_t) * pDataSheet->tableCount;
      pDataSheet->pTables = pCur;

      for (uint32_t i = 0; i < pDataSheet->tableCount;   i)
      {
         STable* pTable = GetTable(pDataSheet, i);
         pCur  = sizeof(STable);
         pTable->pColumnHeaders = (SColumnHeader*)(pCur);
         pCur  = sizeof(SColumnHeader) * pTable->columnCount;
         pTable->pCells = (SCell*)(pCur);
         pCur  = sizeof(SCell) * pTable->columnCount * pTable->rowCount;
      }
      pDataSheet->pStrings = (char*)pCur;
      return pDataSheet;
   }
   else
   {
      return nullptr;
   }
}

void* LoadToMem(const char* pPath)
{
   //FILE* pFile = fopen(pPath, "rb");
   FILE* pFile;
   if (fopen_s(&pFile, pPath, "rb") != 0)
   {
      return nullptr;
   }
   if (pFile == nullptr)
   {
      return nullptr;
   }
   fseek(pFile, 0, SEEK_END);
   long size = ftell(pFile);
   fseek(pFile, 0, SEEK_SET);
   void* pBuffer = malloc(size_t(size));
   if (pFile == nullptr)
   {
      fclose(pFile);
      return nullptr;
   }
   if (fread(pBuffer, size_t(size), 1, pFile) != 1)
   {
      free(pBuffer);
      fclose(pFile);
      return nullptr;
   }
   fclose(pFile);
   return pBuffer;
}

void Test()
{
   void* pData = LoadToMem("javelindata_perks.datasheet");
   if (pData == nullptr)
   {
      return;
   }
   const SDataSheet* pDataSheet = LocateDataSheet(pData);
   if (pDataSheet != nullptr)
   {
      for (uint32_t i = 0; i < pDataSheet->tableCount;   i)
      {
         const STable* pTable = GetTable(pDataSheet, i);

         const SColumnHeader* pHeaders = pTable->pColumnHeaders;
         for (uint32_t j = 0; j < pTable->columnCount;   j)
         {
            if (j > 0)
            {
               printf(",");
            }
            printf("%s", pDataSheet->pStrings   pHeaders[j].name.offset);
         }
         printf("\n");
         const SCell* pCells = pTable->pCells;
         for (uint32_t j = 0; j < pTable->rowCount;   j)
         {
            const SCell* pRow = pCells   (j * pTable->columnCount);
            for (uint32_t k = 0; k < pTable->columnCount;   k)
            {
               if (k > 0)
               {
                  printf(",");
               }
               switch (pHeaders[k].type)
               {
               case Type_String:
               {
                  printf("%s", pDataSheet->pStrings   pRow[k].value.offset);
                  break;
               }
               case Type_Float:
               {
                  printf("%G", pRow[k].value.floatValue);
                  break;
               }
               case Type_Bool:
               {
                  printf("%s", pRow[k].value.boolValue ? "true" : "false");
                  break;
               }
               }
            }
            printf("\n");
         }

      }
   }
   free(pData);
}
int main(int argc, const char*const*argv)
{
   Test();
   return 0;
}

April 11, 20251 yr

Author
Localization

TrainMan, posted Mon Aug 09, 2021 5:07 am (65634)

Kattoor wrote:

Can't really make sense of these pointers / this data..

I'm probably misinterpreting the data. Could someone help me please?

I think you're off by 4 bytes when reading the columns, and you only need to read 8 bytes when reading the row/cell data.

Lord Vaako wrote:

rows * cols * 8bytes (depending on the column type, it is a float / int value or an offset)

How are you checking if a cell is empty? For type 1 cols seems like both the start/end offset are the same, althought not always and thats whats throwing me off atm.

April 11, 20251 yr

Author
Localization

Kattoor, posted Mon Aug 09, 2021 9:25 am (65635)

Sir Kane wrote:

Code:

#include 
#include 
#include 
...

Thank you!!

For anyone else like me without a background in C, here's my working Node.js version for datasheet parsing:

https://gist.github.com/Kattoor/50155a2 ... 9def622b27

April 11, 20251 yr

Author
Localization

Soller, posted Mon Aug 09, 2021 9:51 am (65638)

Rust Version:

Code:

use binread::BinReaderExt;
use binread::{
 derive_binread,
 io::{Cursor, Read, Seek, SeekFrom},
 BinRead, BinResult, NullString, ReadOptions,
};
// use serde::*;
use serde_json::Result as JsonResult;
use std::collections::HashMap;

fn get_string(reader: &mut R, ro: &ReadOptions, args: A) -> BinResult

where
 R: Read   Seek,
 BR: BinRead,
 A: Copy   'static,
{
 let _pos = reader.seek(SeekFrom::Start(ro.offset))?;
 BR::read_options(reader, &ro, args)
}

#[derive_binread]
#[derive(Debug, Clone, Copy)]
pub struct DatasheetHeader {
 revision: u32,
 unknown1: u32,
 unique_id_offset: u32,
 unknown2: u32,
 type_offset: u32,
 row_number: u32,
 plain_text_length: u32,
 unknown3: u32,
 unknown4: u32,
 unknown5: u32,
 unknown6: u32,
 unknown7: u32,
 unknown8: u32,
 unknown9: u32,
 #[br(temp)]
 _plain_text_offset: u32,
 #[br(calc = 60   _plain_text_offset)]
 plain_text_offset: u32,
 header_sig: u32,
 unknown10: u32,
 pub columns: u32,
 pub rows: u32,
 unknown11: u32,
 unknown12: u32,
 unknown13: u32,
 unknown14: u32,
}

#[derive_binread]
#[derive(Debug, Clone)]
#[br(import(data_offset: u32))]
pub struct DatasheetColumn {
 unknown15: u32,
 #[br(temp)]
 _column_name_offset: u32,
 #[br(calc = data_offset   _column_name_offset)]
 column_name_offset: u32,
 column_type: u32,
 #[br(parse_with = get_string, offset=column_name_offset as u64)]
 #[br(restore_position)]
 pub column_name: NullString,
}

#[derive_binread]
#[derive(Debug, Clone)]
#[br(import(data_offset: u32))]
pub struct DatasheetRow {
 #[br(temp)]
 _row_value_offset: u32,
 #[br(calc = data_offset   _row_value_offset)]
 row_value_offset: u32,
 row_value_or_something: u32,
 #[br(parse_with = get_string, offset=row_value_offset as u64)]
 #[br(restore_position)]
 pub value: NullString,
}

#[derive(Debug, BinRead, Clone)]
#[br(import(column_count: u32, data_offset: u32))]
#[br(assert(row.len() as u32 == column_count))]
pub struct DatasheetRows {
 #[br(count = column_count)]
 #[br(args(data_offset))]
 pub row: Vec,
}

#[derive(Debug, BinRead)]
#[br(assert(columns.len() as u32 == header.columns))]
#[br(assert(rows.len() as u32 == header.rows))]
pub struct Datasheet {
 pub header: DatasheetHeader,
 #[br(args(header.plain_text_offset))]
 #[br(count = header.columns)]
 pub columns: Vec,
 #[br(count = header.rows)]
 #[br(args(header.columns, header.plain_text_offset))]
 pub rows: Vec,
}

#[allow(dead_code)]
pub struct DatasheetParser {
 pub datasheet: Datasheet,
}

#[allow(dead_code)]
impl DatasheetParser {
 pub fn to_json(&self) -> JsonResult {
  let _json = serde_json::to_string_pretty(&self.get_data())?;
  Ok(_json)
 }
 pub fn to_xml(&self) -> anyhow::Result {
  let xml = quick_xml::se::to_string(&self.get_data()).unwrap();
  Ok(xml)
 }

 pub fn get_data(&self) -> Vec> {
  let columns: Vec = self
   .datasheet
   .columns
   .iter()
   .map(|c| c.column_name.clone().into_string())
   .collect();
  let mut combined: Vec> = Vec::new();
  for n in &self.datasheet.rows {
   let row_data: Vec = n
    .row
    .iter()
    .map(|v| v.value.clone().into_string())
    .collect();
   let data: HashMap<_, _> = columns
    .clone()
    .into_iter()
    .zip(row_data.into_iter())
    .collect();
   combined.push(data)
  }
  combined
 }
 pub fn new(file: Vec) -> DatasheetParser {
  DatasheetParser {
   datasheet: Cursor::new(file).read_le().unwrap(),
  }
 }
}

Typescript:

Code:

/* eslint-disable unicorn/filename-case */
import { Parser } from "binary-parser";

let textOffset = 0;
const DataSheetColumn = Parser.start()
  .uint32le("unknown")
  .uint32le("column_name_offset")
  .uint32le("type");

const DataSheetData = Parser.start().array(null, {
  type: Parser.start().array(null, {
    type: Parser.start()
      .uint32le("value_offset")
      .uint32le("offset_or_something"),
    formatter: (val) => {
      return val.map((v) => v.value_offset);
    },
    length: (items) => items.column_count,
  }),
  length: (items) => items.row_count,
});

const DataSheetRow = Parser.start()
  .saveOffset("offset", { formatter: (val) => val - textOffset })
  .string("value", { zeroTerminated: true });

const GetString = Parser.start().string(null, { zeroTerminated: true });

const DataSheetHeader = new Parser()
  .uint32le("revision")
  .uint32le("unknown1")
  .uint32le("unique_id_offset")
  .uint32le("unknown2")
  .uint32le("type_offset")
  .uint32le("row_number")
  .uint32le("plain_text_length")
  .uint32le("unknown3")
  .uint32le("unknown4")
  .uint32le("unknown5")
  .uint32le("unknown6")
  .uint32le("unknown7")
  .uint32le("unknown8")
  .uint32le("unknown9")
  .uint32le("plain_text_offset", {
    formatter: (x) => {
      const offs = (x as number)   60;
      textOffset = offs;
      return offs;
    },
  })
  .uint32be("header_sig", {
    formatter: (val) => `0x${val.toString(16).toUpperCase()}`,
  })
  .uint32le("unknown10")
  .uint32le("column_count")
  .uint32le("row_count")
  .uint32le("unknown11")
  .uint32le("unknown12")
  .uint32le("unknown13")
  .uint32le("unknown14")
  .pointer("unique_id", {
    type: GetString,
    offset: (items) => {
      return items.plain_text_offset   items.unique_id_offset;
    },
  })
  .pointer("data_type", {
    type: GetString,
    offset: (items) => {
      return items.plain_text_offset   items.type_offset;
    },
  })
  .array("column_names", {
    type: DataSheetColumn,
    length: (items) => {
      return items.column_count;
    },
    formatter: (items) => {
      return items.map((item) => item.column_name_offset);
    },
  })
  .nest("rows", {
    type: DataSheetData,
  })
  .array("plain_text", {
    type: DataSheetRow,
    readUntil: "eof",
    zeroTerminated: true,
    key: "offset",
  });

export function parseDataSheet(data: Buffer) {
  return new Parser().nest(null, { type: DataSheetHeader }).parse(data);
}

And I've attached a https://kaitai.io/ struct file as well. Have fun

April 11, 20251 yr

Author
Localization

badmp3, posted Mon Aug 09, 2021 8:02 pm (65658)

del

April 11, 20251 yr

Author
Localization

Pako, posted Wed Aug 11, 2021 1:58 am (65674)

For completeness sake here's a parser written in Haskell. other than bytestring, attoparsec, and attoparsec-binary it doesn't depend on anything. Also included a small usage example where I turn the parsed data into a CSV file.

Code:

module DatasheetParser
       ( parseDatasheet
       , Datasheet (..)
       , Table (..)
       , Row (..)
       , Cell (..)
       , Column (columnName)
       , ColumnType
       ) where

import qualified Data.ByteString as BS
import qualified Data.Attoparsec.ByteString as A
import qualified Data.Attoparsec.Binary as B
import Data.Word
import Control.Monad (forM)
import GHC.Float (castWord32ToFloat)

getString :: BS.ByteString -> Int -> BS.ByteString
getString dat offset = BS.takeWhile (/= 0) $ BS.drop (offset) dat

data DatasheetHeader 
       = DatasheetHeader { headerRevision        :: Word32
                         , headerUniqueIdOffset  :: Word32
                         , headerTypeOffset      :: Word32
                         , headerRowNumber       :: Word32 
                         , headerPlainTextLen    :: Word32
                         , headerPlainTextOffset :: Word32
                         , headerSig             :: Word32
                         , headerColumnCount     :: Word32
                         , headerRowCount        :: Word32
                         } deriving (Show)

data DatasheetHeaderStrings 
       = DatasheetHeaderStrings { headerUniqueId :: BS.ByteString
                                , headerType     :: BS.ByteString 
                                } deriving (Show)

data ColumnType = TString
                | TFloat
                | TBool 
                deriving (Show)

data Column 
       = Column { columnName :: BS.ByteString 
                , columnType :: ColumnType
                }
              deriving (Show)

data Cell = CString BS.ByteString 
          | CFloat Float
          | CBool Bool
          deriving (Show)

newtype Row = Row [Cell]
       deriving (Show)
newtype Table = Table [Row]
       deriving (Show)

data Datasheet = Datasheet DatasheetHeader DatasheetHeaderStrings [Column] Table
       deriving (Show)

datasheetHeaderParser :: A.Parser DatasheetHeader 
datasheetHeaderParser = do
       revision             <- B.anyWord32le
       _                    <- A.take 4
       uniqueIdOffset       <- B.anyWord32le
       _                    <- A.take 4
       typeOffset           <- B.anyWord32le
       rowNumber            <- B.anyWord32le
       plainTextLength      <- B.anyWord32le
       _                    <- A.take 28
       plainTextOffset      <- B.anyWord32le >>= return . (  60)
       hSig                 <- B.anyWord32be
       _                    <- A.take 4
       columnCount          <- B.anyWord32le
       rowCount             <- B.anyWord32le
       _                    <- A.take 16
       return 
              $ DatasheetHeader
                revision
                uniqueIdOffset
                typeOffset
                rowNumber
                plainTextLength
                plainTextOffset
                hSig
                columnCount
                rowCount

datasheetHeaderStrings :: DatasheetHeader -> BS.ByteString -> DatasheetHeaderStrings
datasheetHeaderStrings h d 
       = DatasheetHeaderStrings
         (getString d (fromIntegral $ headerUniqueIdOffset h))
         (getString d (fromIntegral $ headerTypeOffset h))

parseColumn :: BS.ByteString -> A.Parser Column
parseColumn s = do
       _                    <- A.take 4
       columnNameOffset     <- B.anyWord32le
       columnType           <- B.anyWord32le
       return 
              $ Column
                (getString s (fromIntegral columnNameOffset))
                (case fromIntegral columnType of
                       1 -> TString
                       2 -> TFloat
                       3 -> TBool
                       _ -> error $ show columnType)

parseCell :: BS.ByteString -> ColumnType -> A.Parser Cell
parseCell s t = do
       so    <- B.anyWord32le
       value <- B.anyWord32le
       return (case t of
              TString -> CString (getString s $ fromIntegral value)
              TFloat  -> CFloat $ castWord32ToFloat value
              TBool   -> CBool (if (fromIntegral value) == 0 then False else True))

datasheetParser :: BS.ByteString -> A.Parser Datasheet
datasheetParser d = do
       header               <- datasheetHeaderParser
       let plen             = fromIntegral $ headerPlainTextLen header
           strings          = BS.drop (BS.length d - plen) d
           headerStrings    = datasheetHeaderStrings header strings
       columns              <- forM [1..fromIntegral $ headerColumnCount header] (const $ parseColumn strings)
       rows                 <- forM [1..fromIntegral $ headerRowCount header] (const (forM columns ((parseCell strings) . columnType) >>= return . Row))
       return 
              $ Datasheet
                header
                headerStrings
                columns
                (Table rows)

parseDatasheet :: BS.ByteString -> Maybe Datasheet
parseDatasheet d = case A.parse (datasheetParser d) d of 
       A.Done _ x -> Just x
       _          -> Nothing

Code:

module Main (main) where

import DatasheetParser
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import qualified Data.Csv as C
import Data.String (fromString)

fromCell :: Cell -> BS.ByteString
fromCell c  = case c of
    CString s -> s
    CFloat f  -> fromString $ show f
    CBool b   -> fromString $ show b

sheetToCsv :: Datasheet -> BS.ByteString
sheetToCsv (Datasheet _ _ columns (Table rows)) 
    = let 
        colnames = map (columnName) columns
        r = map (\(Row cells) -> map fromCell cells) rows
        in BSL.toStrict $ C.encode ([colnames]    r)

main :: IO ()
main = do
    f <- BS.readFile "sample/javelindata_crafting.datasheet"
    let sheet = parseDatasheet f
    case sheet of
        Just s -> BS.writeFile "out.csv" (sheetToCsv s)
        Nothing -> print "oops"

April 11, 20251 yr

Author
Localization

badmp3, posted Sat Aug 14, 2021 8:48 pm (65728)

fixed

New World .datasheet file format

Featured Replies

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)