I'm working on a level editing tool that saves its data as XML.
This is ideal during development, as it's painless to make small changes to the data format, and it works nicely with tree-like data.
The downside, though, is that the XML files are rather bloated, mostly due to duplication of tag and attribute names. Also due to numeric data taking significantly more space than using native datatypes. A small level could easily end up as 1Mb+. I want to get these sizes down significantly, especially if the system is to be used for a game on the iPhone or other devices with relatively limited memory.
The optimal solution, for memory and performance, would be to convert the XML to a binary level format. But I don't want to do this. I want to keep the format fairly flexible. XML makes it very easy to add new attributes to objects, and give them a default value if an old version of the data is loaded. So I want to keep with the hierarchy of nodes, with attributes as name-value pairs.
But I need to store this in a more compact format - to remove the massive duplication of tag/attribute names. Maybe also to give attributes native types, so, for example floating-point data is stored as 4 bytes per float, not as a text string.
Google/Wikipedia reveal that 'binary XML' is hardly a new problem - it's been solved a number of times already. Has anyone here got experience with any of the existing systems/standards? - are any ideal for games use - with a free, lightweight and cross-platform parser/loader library (C/C++) available?
Or should I reinvent this wheel myself?
Or am I better off forgetting the ideal, and just compressing my raw .xml data (it should pack well with zip-like compression), and just taking the memory/performance hit on-load?
Answer
We used binary XML heavily for Superman Returns: The Videogame. We're talking thousands and thousands of files. It worked OK, but honestly didn't seem worth the effort. It ate up a noticeable fraction of our loading time, and the "flexibility" of XML didn't scale up. After a while, our data files had too many weird identifiers, external references that needed to be kept in sync, and other strange requirements for them to really be feasibly human-edited any more.
Also, XML is really a markup format, and not a data format. It's optimized for a lot of text with occasional tags. It isn't great for fully structured data. It wasn't my call, but if it had been and I knew then what I know now, I probably would have done JSON or YAML. They're both terse enough to not require compaction, and are optimized for representing data, not text.
No comments:
Post a Comment