diff options
author | Katharina Fey <kookie@spacekookie.de> | 2020-08-21 14:48:04 +0200 |
---|---|---|
committer | Katharina Fey <kookie@spacekookie.de> | 2020-08-21 14:48:04 +0200 |
commit | 9f00deb9fcb35f43c2511a0ea562eed33271c723 (patch) | |
tree | b24e4080384ae205ae8b7a5c43716757cb20427b | |
parent | b4b9515fd969cb6445c8a93b7eb1ec8ebe529949 (diff) | |
parent | f28abd489cc7ebd9f2d14f584052a93607d78985 (diff) |
Add 'g3f/' from commit 'f28abd489cc7ebd9f2d14f584052a93607d78985'
git-subtree-dir: g3f
git-subtree-mainline: b4b9515fd969cb6445c8a93b7eb1ec8ebe529949
git-subtree-split: f28abd489cc7ebd9f2d14f584052a93607d78985
-rw-r--r-- | g3f/README.md | 189 |
1 files changed, 189 insertions, 0 deletions
diff --git a/g3f/README.md b/g3f/README.md new file mode 100644 index 0000000..1d92cd7 --- /dev/null +++ b/g3f/README.md @@ -0,0 +1,189 @@ +# git friendly file format (`g3f`) + +A flat file format that can encode literally anything, +while being plain text (`utf-8` tho) and very git friendly. + +**What does git friendly mean?** + +Changes are done in place, resulting in visually pleasing +(and useful) diffs that are generated by VCS programs such as `git`. + +## The spec + +Before we start with the (more or less) formal specification, +there's some design principles that went into designing `g3f`: + +- Easy to write by hand: a human should easily be able to write + a data file, without much effort or boilerplate. It should also + be possible to edit generated files without being swamped with + boilerplate (or indentation!) +- Flat structure: a file should not allow for nested structures + in the file itself. This adds complexity and makes it harder + to edit by hand. It also adds complexity at the parse level + and makes graphs more difficult +- VCS friendly: a file change should only touch parts of the + data section that were changed. + +Now... + +`g3f` files are strongly typed. +This means that every file has an schema section in it's header +defining what data types exist and how they are layed out. + +The file extention for a `g3f` file is `.g3f` by default, +however this implementation is not opinionated on that. + +Another important point: +it should be considered part of the spec that writes are +done in-place to existing data. +No data should be overwritten if not explicitly desired +by the application using the `g3f` library! + +### Header + +At the top of every `g3f` file is a header. +It contains the spec version the file was made with +as well as the implementation `ID` and version. + +It looks something like this + +```g3f +{header:builtin/header} +{spec "1.0.0"} +{impl "g3f-reference"} +{impl_version "0.8.5"} + +{data} +``` + +A few notes here: + +- `g3f` is a flat format. When declaring a new top-level block + (i.e. `{data}`) this ends the `{header}` block. +- A block can enforce a schema (i.e. here we enforce that all + required fields from `builtin/header` are present) +- Nodes always have a single data value. Supported types are + - string (`"1.0.0"`) + - int (`42`) + - float (`13.37`) + - bool (`true`|`false`) + - list<...> (`[ ... ]` - Elements are not comma-separated!) + - ref (`some_id` - not quoted!) + - type (`<...>` refers to some type information + - schema (`<schema>` as a literal) +- `#` is a line comment. There are no block-comments + +### Schemas + +As previously mentioned `g3f` is a strongly typed file format. +Schemas are IDs that can be referenced by other IDs. +Because `g3f` is completely flat, it's impossible to have a `{schemas}` +block in which to define schemas. +Instead inside the header it's possible to use the `<schema>` type marker +to pre-declare schema data which will later be defined by blocks. + +```g3f +{header} +{node <schema>} +{links <schema>} + +{node} +{id <int>} +{links <list<int>>} + +{link} +{id <int>} +{in <int>} +{out <int>} +``` + +### Defining data + +Then using these schemas is easy enough. +You don't have to use schemas however, +if you want your file format to be completely dynamic and terrible. + +```g3f +{<>:node} +{id 0} +{links [ 1 ]} + +{<>:node} +{id 1} +{links [ 0 ]} +``` + +Note that `<>` in the name position of a block refers to an anonymous block without a name of it's own. +Deserialisation of this file would happen as a list of nodes, each without a name. + +When building graph structures, it is possible to have loops. +This is allowed via `g3f`. + +Also of note: when using blocks that are named, in a flat structure, +deserialisation happens as a map `name => { data }`! + +### Some thoughts on deserialisation + +(not specifically part of the spec - to be expanded!) + +Deserialised into C code this would look like the following: + +```C +struct node_t { + id: int32_t; + links: *int32_t; +} + +struct node_t * nodes = [ node_t { ... }, node_t { ... } ]; +``` + +Because `g3f` has no hierarchical structure, and there's no in-file format references between the two nodes, +the deserialised returns a list of nodes. +Building a graph in memory is then your responsibility. +However, `g3f` can handle a few scenarios for you. + +Image we used references, instead of integers, for links: + +```g3f +{node} +{id <int>} +{links <list<ref>>} +``` + +What does this change? Well let's look at a data section: + +```g3f +{node_0:node} +{id 0} +{links [ node_1 ]} + +{node_1:node} +{id 1} +{links [ node_0 ]} +``` + +In this case, `g3f` will deserialise into a list with a single node, +which is `node_0` because it is considered the root-node for the graph. + +### Upgradability + +Applications might add new fields to their schemas and data sections. +In binary encoders such as protobuf, code is specifically generated for +an exchange format and also includes forwards compatible markers to +allow for schema changes. + +`g3f` needs none of that! +Because data state inside the parser is dynamic and type checking +is only done against the schema in a file, +if the code using the parser library doesn't expect certain +data keys or expects others to be there that aren't present, +this can be gracefully handled. + +New keys can be added the same way they would be in a dynamic file. +Keys that are present despite not being expected can simply be ignored. +The spec makes explicit note of writes and re-writes being done +in-place, +meaning that changes are always local to the keys that are changed. +If an update ignores certain keys, it doesn't matter if they were +ignored because they were not important or unknown to the application. + |