From 89d2605fe096b7ff483b0f7acb5e4f29c8c6e98f Mon Sep 17 00:00:00 2001
From: Katharina Fey <kookie@spacekookie.de>
Date: Thu, 28 Feb 2019 18:02:15 +0100
Subject: Adding the initial spec draft

---
 README.md | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)
 create mode 100644 README.md
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..8e78484
--- /dev/null
+++ b/README.md
@@ -0,0 +1,183 @@
+# git friendly file format (`g3f`)
+
+A flat file format that can encode literally anything,
+while being plain text (`utf-8` tho) and very git friendly.
+
+**What does git friendly mean?**
+
+Changes are done in place, resulting in visually pleasing
+(and useful) diffs that are generated by VCS programs such as `git`.
+
+## The spec
+
+Before we start with the (more or less) formal specification,
+there's some design principles that went into designing `g3f`:
+
+- Easy to write by hand: a human should easily be able to write
+  a data file, without much effort or boilerplate. It should also
+  be possible to edit generated files without being swamped with
+  boilerplate (or indentation!)
+- Flat structure: a file should not allow for nested structures
+  in the file itself. This adds complexity and makes it harder
+  to edit by hand. It also adds complexity at the parse level
+  and makes graphs more difficult
+- VCS friendly: a file change should only touch parts of the
+  data section that were changed.
+
+Now...
+
+`g3f` files are strongly typed.
+This means that every file has an schema section in it's header
+defining what data types exist and how they are layed out.
+
+The file extention for a `g3f` file is `.g3f` by default, 
+however this implementation is not opinionated on that.
+
+### Header
+
+At the top of every `g3f` file is a header.
+It contains the spec version the file was made with
+as well as the implementation `ID` and version.
+
+It looks something like this
+
+```g3f
+{header:builtin/header}
+{spec "1.0.0"}
+{impl "g3f-reference"}
+{impl_version "0.8.5"}
+
+{schemas} 
+# ...
+```
+
+A few notes here:
+
+- `g3f` is a flat format. When declaring a new top-level block (i.e. `{schemas}`) this ends the `{header}` block.
+- A block can enforce a schema (i.e. here we enforce that all required fields from `builtin/header` are present)
+- Nodes always have a single data value. Supported types are
+  - string (`"1.0.0"`)
+  - int (`42`)
+  - float (`13.37`)
+  - bool (`true`|`false`)
+  - list<...> (`[ ... ]` - Elements are not comma-separated!)
+  - ref (`some_id` - not quoted!)
+  - type (`<...>` refers to some type information
+  - NULL (`<>` which is an empty type/name marker)
+- `#` is a line comment. There are no block-comments
+
+### Schemas
+
+As previously mentioned `g3f` is a strongly typed file format.
+Schemas are IDs that can be referenced by other IDs.
+But because `g3f` is completely flat, it's impossible to define
+schema blocks inside the `{schema}` block itself.
+
+Instead it uses the `NULL` markers to define the existence of schemas.
+Schemas are then later defined in-line with the rest of the data. 
+
+```g3f
+{schemas}
+{node <>}
+{link <>}
+
+{node}
+{id <int>}
+{links <list<int>>}
+
+{link}
+{id <int>}
+{in <int>}
+{out <int>}
+```
+
+### Defining data
+
+Then using these schemas is easy enough.
+You don't have to use schemas however,
+if you want your file format to be completely dynamic and terrible.
+
+```g3f
+{<>:node}
+{id 0}
+{links [ 1 ]:}
+
+{<>:node}
+{id 1}
+{links [ 0 ]}
+```
+
+Note that `<>` in the name position of a block refers to an anonymous block without a name of it's own.
+Deserialisation of this file would happen as a list of nodes, each without a name.
+
+When building graph structures, it is possible to have loops.
+This is allowed via `g3f`.
+
+Also of note: when using blocks that are named, in a flat structure,
+deserialisation happens as a map `name => { data }`!
+
+### Some thoughts on deserialisation
+
+(not specifically part of the spec - to be expanded!)
+
+Deserialised into C code this would look like the following:
+
+```C
+struct node_t {
+  id: int32_t;
+  links: *int32_t;
+}
+
+struct node_t * nodes = [ node_t { ... }, node_t { ... } ];
+```
+
+Because `g3f` has no hirarchy structure, and there's no in-file format references between the two nodes,
+the deserialised returns a list of nodes.
+Building a graph in memory is then your responsibility.
+However, `g3f` can handle a few scenarios for you.
+
+Image we used references, instead of integers, for links:
+
+```g3f
+{node}
+{id <int>}
+{links <list<ref>>}
+```
+
+What does this change? Well let's look at a data section: 
+
+```g3f
+{node_0:node}
+{id 0}
+{links [ node_1 ]}
+
+{node_1:node}
+{id 1}
+{links [ node_0 ]}
+```
+
+In this case, `g3f` will deserialise into a list with a single node, 
+which is `node_0` because it is considered the root-node for the graph.
+
+### Upgradability
+
+Applications might add new fields to their schemas and data sections.
+In binary encoders such as protobuf, code is specifically generated for
+an exchange format and also includes forwards compatible markers to
+allow for schema changes.
+
+`g3f` needs none of that!
+Because data state inside the parser is dynamic and type checking
+is only done against the schema in a file,
+if the code using the parser library doesn't expect certain
+data keys or expects others to be there that aren't present,
+this can be gracefully handled.
+
+New keys can be added the same way they would be in a dynamic file.
+Keys that are present despite not being expected can simply be ignored.
+The spec makes explicit note of writes and re-writes being done
+in-place,
+meaning that changes are always local to the keys that are changed.
+If an update ignores certain keys, it doesn't matter if they were
+ignored because they were not important or unknown to the application.
+
-- 
cgit v1.2.3


From f28abd489cc7ebd9f2d14f584052a93607d78985 Mon Sep 17 00:00:00 2001
From: Katharina Fey <kookie@spacekookie.de>
Date: Thu, 28 Feb 2019 20:33:22 +0100
Subject: Adjusting the way that schemas work

---
 README.md | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/README.md b/README.md
index 8e78484..1d92cd7 100644
--- a/README.md
+++ b/README.md
@@ -33,6 +33,12 @@ defining what data types exist and how they are layed out.
 The file extention for a `g3f` file is `.g3f` by default, 
 however this implementation is not opinionated on that.
 
+Another important point:
+it should be considered part of the spec that writes are
+done in-place to existing data.
+No data should be overwritten if not explicitly desired
+by the application using the `g3f` library!
+
 ### Header
 
 At the top of every `g3f` file is a header.
@@ -47,14 +53,15 @@ It looks something like this
 {impl "g3f-reference"}
 {impl_version "0.8.5"}
 
-{schemas} 
-# ...
+{data}
 ```
 
 A few notes here:
 
-- `g3f` is a flat format. When declaring a new top-level block (i.e. `{schemas}`) this ends the `{header}` block.
-- A block can enforce a schema (i.e. here we enforce that all required fields from `builtin/header` are present)
+- `g3f` is a flat format. When declaring a new top-level block
+  (i.e. `{data}`) this ends the `{header}` block.
+- A block can enforce a schema (i.e. here we enforce that all 
+  required fields from `builtin/header` are present)
 - Nodes always have a single data value. Supported types are
   - string (`"1.0.0"`)
   - int (`42`)
@@ -63,23 +70,22 @@ A few notes here:
   - list<...> (`[ ... ]` - Elements are not comma-separated!)
   - ref (`some_id` - not quoted!)
   - type (`<...>` refers to some type information
-  - NULL (`<>` which is an empty type/name marker)
+  - schema (`<schema>` as a literal)
 - `#` is a line comment. There are no block-comments
 
 ### Schemas
 
 As previously mentioned `g3f` is a strongly typed file format.
 Schemas are IDs that can be referenced by other IDs.
-But because `g3f` is completely flat, it's impossible to define
-schema blocks inside the `{schema}` block itself.
-
-Instead it uses the `NULL` markers to define the existence of schemas.
-Schemas are then later defined in-line with the rest of the data. 
+Because `g3f` is completely flat, it's impossible to have a `{schemas}`
+block in which to define schemas.
+Instead inside the header it's possible to use the `<schema>` type marker
+to pre-declare schema data which will later be defined by blocks.
 
 ```g3f
-{schemas}
-{node <>}
-{link <>}
+{header}
+{node <schema>}
+{links <schema>}
 
 {node}
 {id <int>}
@@ -100,7 +106,7 @@ if you want your file format to be completely dynamic and terrible.
 ```g3f
 {<>:node}
 {id 0}
-{links [ 1 ]:}
+{links [ 1 ]}
 
 {<>:node}
 {id 1}
@@ -131,7 +137,7 @@ struct node_t {
 struct node_t * nodes = [ node_t { ... }, node_t { ... } ];
 ```
 
-Because `g3f` has no hirarchy structure, and there's no in-file format references between the two nodes,
+Because `g3f` has no hierarchical structure, and there's no in-file format references between the two nodes,
 the deserialised returns a list of nodes.
 Building a graph in memory is then your responsibility.
 However, `g3f` can handle a few scenarios for you.
-- 
cgit v1.2.3