# git friendly file format (`g3f`) A flat file format that can encode literally anything, while being plain text (`utf-8` tho) and very git friendly. **What does git friendly mean?** Changes are done in place, resulting in visually pleasing (and useful) diffs that are generated by VCS programs such as `git`. ## The spec Before we start with the (more or less) formal specification, there's some design principles that went into designing `g3f`: - Easy to write by hand: a human should easily be able to write a data file, without much effort or boilerplate. It should also be possible to edit generated files without being swamped with boilerplate (or indentation!) - Flat structure: a file should not allow for nested structures in the file itself. This adds complexity and makes it harder to edit by hand. It also adds complexity at the parse level and makes graphs more difficult - VCS friendly: a file change should only touch parts of the data section that were changed. Now... `g3f` files are strongly typed. This means that every file has an schema section in it's header defining what data types exist and how they are layed out. The file extention for a `g3f` file is `.g3f` by default, however this implementation is not opinionated on that. Another important point: it should be considered part of the spec that writes are done in-place to existing data. No data should be overwritten if not explicitly desired by the application using the `g3f` library! ### Header At the top of every `g3f` file is a header. It contains the spec version the file was made with as well as the implementation `ID` and version. It looks something like this ```g3f {header:builtin/header} {spec "1.0.0"} {impl "g3f-reference"} {impl_version "0.8.5"} {data} ``` A few notes here: - `g3f` is a flat format. When declaring a new top-level block (i.e. `{data}`) this ends the `{header}` block. - A block can enforce a schema (i.e. here we enforce that all required fields from `builtin/header` are present) - Nodes always have a single data value. Supported types are - string (`"1.0.0"`) - int (`42`) - float (`13.37`) - bool (`true`|`false`) - list<...> (`[ ... ]` - Elements are not comma-separated!) - ref (`some_id` - not quoted!) - type (`<...>` refers to some type information - schema (`` as a literal) - `#` is a line comment. There are no block-comments ### Schemas As previously mentioned `g3f` is a strongly typed file format. Schemas are IDs that can be referenced by other IDs. Because `g3f` is completely flat, it's impossible to have a `{schemas}` block in which to define schemas. Instead inside the header it's possible to use the `` type marker to pre-declare schema data which will later be defined by blocks. ```g3f {header} {node } {links } {node} {id } {links >} {link} {id } {in } {out } ``` ### Defining data Then using these schemas is easy enough. You don't have to use schemas however, if you want your file format to be completely dynamic and terrible. ```g3f {<>:node} {id 0} {links [ 1 ]} {<>:node} {id 1} {links [ 0 ]} ``` Note that `<>` in the name position of a block refers to an anonymous block without a name of it's own. Deserialisation of this file would happen as a list of nodes, each without a name. When building graph structures, it is possible to have loops. This is allowed via `g3f`. Also of note: when using blocks that are named, in a flat structure, deserialisation happens as a map `name => { data }`! ### Some thoughts on deserialisation (not specifically part of the spec - to be expanded!) Deserialised into C code this would look like the following: ```C struct node_t { id: int32_t; links: *int32_t; } struct node_t * nodes = [ node_t { ... }, node_t { ... } ]; ``` Because `g3f` has no hierarchical structure, and there's no in-file format references between the two nodes, the deserialised returns a list of nodes. Building a graph in memory is then your responsibility. However, `g3f` can handle a few scenarios for you. Image we used references, instead of integers, for links: ```g3f {node} {id } {links >} ``` What does this change? Well let's look at a data section: ```g3f {node_0:node} {id 0} {links [ node_1 ]} {node_1:node} {id 1} {links [ node_0 ]} ``` In this case, `g3f` will deserialise into a list with a single node, which is `node_0` because it is considered the root-node for the graph. ### Upgradability Applications might add new fields to their schemas and data sections. In binary encoders such as protobuf, code is specifically generated for an exchange format and also includes forwards compatible markers to allow for schema changes. `g3f` needs none of that! Because data state inside the parser is dynamic and type checking is only done against the schema in a file, if the code using the parser library doesn't expect certain data keys or expects others to be there that aren't present, this can be gracefully handled. New keys can be added the same way they would be in a dynamic file. Keys that are present despite not being expected can simply be ignored. The spec makes explicit note of writes and re-writes being done in-place, meaning that changes are always local to the keys that are changed. If an update ignores certain keys, it doesn't matter if they were ignored because they were not important or unknown to the application.