aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKatharina Fey <kookie@spacekookie.de>2020-08-21 14:48:04 +0200
committerKatharina Fey <kookie@spacekookie.de>2020-08-21 14:48:04 +0200
commit9f00deb9fcb35f43c2511a0ea562eed33271c723 (patch)
treeb24e4080384ae205ae8b7a5c43716757cb20427b
parentb4b9515fd969cb6445c8a93b7eb1ec8ebe529949 (diff)
parentf28abd489cc7ebd9f2d14f584052a93607d78985 (diff)
Add 'g3f/' from commit 'f28abd489cc7ebd9f2d14f584052a93607d78985'
git-subtree-dir: g3f git-subtree-mainline: b4b9515fd969cb6445c8a93b7eb1ec8ebe529949 git-subtree-split: f28abd489cc7ebd9f2d14f584052a93607d78985
-rw-r--r--g3f/README.md189
1 files changed, 189 insertions, 0 deletions
diff --git a/g3f/README.md b/g3f/README.md
new file mode 100644
index 0000000..1d92cd7
--- /dev/null
+++ b/g3f/README.md
@@ -0,0 +1,189 @@
+# git friendly file format (`g3f`)
+
+A flat file format that can encode literally anything,
+while being plain text (`utf-8` tho) and very git friendly.
+
+**What does git friendly mean?**
+
+Changes are done in place, resulting in visually pleasing
+(and useful) diffs that are generated by VCS programs such as `git`.
+
+## The spec
+
+Before we start with the (more or less) formal specification,
+there's some design principles that went into designing `g3f`:
+
+- Easy to write by hand: a human should easily be able to write
+ a data file, without much effort or boilerplate. It should also
+ be possible to edit generated files without being swamped with
+ boilerplate (or indentation!)
+- Flat structure: a file should not allow for nested structures
+ in the file itself. This adds complexity and makes it harder
+ to edit by hand. It also adds complexity at the parse level
+ and makes graphs more difficult
+- VCS friendly: a file change should only touch parts of the
+ data section that were changed.
+
+Now...
+
+`g3f` files are strongly typed.
+This means that every file has an schema section in it's header
+defining what data types exist and how they are layed out.
+
+The file extention for a `g3f` file is `.g3f` by default,
+however this implementation is not opinionated on that.
+
+Another important point:
+it should be considered part of the spec that writes are
+done in-place to existing data.
+No data should be overwritten if not explicitly desired
+by the application using the `g3f` library!
+
+### Header
+
+At the top of every `g3f` file is a header.
+It contains the spec version the file was made with
+as well as the implementation `ID` and version.
+
+It looks something like this
+
+```g3f
+{header:builtin/header}
+{spec "1.0.0"}
+{impl "g3f-reference"}
+{impl_version "0.8.5"}
+
+{data}
+```
+
+A few notes here:
+
+- `g3f` is a flat format. When declaring a new top-level block
+ (i.e. `{data}`) this ends the `{header}` block.
+- A block can enforce a schema (i.e. here we enforce that all
+ required fields from `builtin/header` are present)
+- Nodes always have a single data value. Supported types are
+ - string (`"1.0.0"`)
+ - int (`42`)
+ - float (`13.37`)
+ - bool (`true`|`false`)
+ - list<...> (`[ ... ]` - Elements are not comma-separated!)
+ - ref (`some_id` - not quoted!)
+ - type (`<...>` refers to some type information
+ - schema (`<schema>` as a literal)
+- `#` is a line comment. There are no block-comments
+
+### Schemas
+
+As previously mentioned `g3f` is a strongly typed file format.
+Schemas are IDs that can be referenced by other IDs.
+Because `g3f` is completely flat, it's impossible to have a `{schemas}`
+block in which to define schemas.
+Instead inside the header it's possible to use the `<schema>` type marker
+to pre-declare schema data which will later be defined by blocks.
+
+```g3f
+{header}
+{node <schema>}
+{links <schema>}
+
+{node}
+{id <int>}
+{links <list<int>>}
+
+{link}
+{id <int>}
+{in <int>}
+{out <int>}
+```
+
+### Defining data
+
+Then using these schemas is easy enough.
+You don't have to use schemas however,
+if you want your file format to be completely dynamic and terrible.
+
+```g3f
+{<>:node}
+{id 0}
+{links [ 1 ]}
+
+{<>:node}
+{id 1}
+{links [ 0 ]}
+```
+
+Note that `<>` in the name position of a block refers to an anonymous block without a name of it's own.
+Deserialisation of this file would happen as a list of nodes, each without a name.
+
+When building graph structures, it is possible to have loops.
+This is allowed via `g3f`.
+
+Also of note: when using blocks that are named, in a flat structure,
+deserialisation happens as a map `name => { data }`!
+
+### Some thoughts on deserialisation
+
+(not specifically part of the spec - to be expanded!)
+
+Deserialised into C code this would look like the following:
+
+```C
+struct node_t {
+ id: int32_t;
+ links: *int32_t;
+}
+
+struct node_t * nodes = [ node_t { ... }, node_t { ... } ];
+```
+
+Because `g3f` has no hierarchical structure, and there's no in-file format references between the two nodes,
+the deserialised returns a list of nodes.
+Building a graph in memory is then your responsibility.
+However, `g3f` can handle a few scenarios for you.
+
+Image we used references, instead of integers, for links:
+
+```g3f
+{node}
+{id <int>}
+{links <list<ref>>}
+```
+
+What does this change? Well let's look at a data section:
+
+```g3f
+{node_0:node}
+{id 0}
+{links [ node_1 ]}
+
+{node_1:node}
+{id 1}
+{links [ node_0 ]}
+```
+
+In this case, `g3f` will deserialise into a list with a single node,
+which is `node_0` because it is considered the root-node for the graph.
+
+### Upgradability
+
+Applications might add new fields to their schemas and data sections.
+In binary encoders such as protobuf, code is specifically generated for
+an exchange format and also includes forwards compatible markers to
+allow for schema changes.
+
+`g3f` needs none of that!
+Because data state inside the parser is dynamic and type checking
+is only done against the schema in a file,
+if the code using the parser library doesn't expect certain
+data keys or expects others to be there that aren't present,
+this can be gracefully handled.
+
+New keys can be added the same way they would be in a dynamic file.
+Keys that are present despite not being expected can simply be ignored.
+The spec makes explicit note of writes and re-writes being done
+in-place,
+meaning that changes are always local to the keys that are changed.
+If an update ignores certain keys, it doesn't matter if they were
+ignored because they were not important or unknown to the application.
+