You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/5-levels-of-data.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,19 @@
3
3
Not all data are created equal.
4
4
There are notable differences in how much you can do with data and how much effort it takes.
5
5
The more reusable data is, the easier it will be to use it as a developer, researcher or other type of data user.
6
+
Re-useability is about being able to transform, sort, query, serialize, modify, render and audit data without requiring too much work.
6
7
7
8
_This list is inspired by Tim Berners-Lee's [5-star open data](https://5stardata.info/en/)_.
8
9
9
10
## Level 0: proprietary data
10
11
11
12
If you don't give others the _rights_ to read, use or modify your data, it's reusability is zero.
12
13
13
-
That's why it's important to have a _licence_ that allow others to use your data.
14
+
That's why it's important to have a _license_ that allow others to use your data.
14
15
A good choice for a permissive option is the [Open Database License](https://opendatacommons.org/licenses/odbl/summary/).
15
16
Creative Commons licenses are also good options to clearly communicate _if_, and if so then _how_, your data is permitted to be re-used.
16
17
17
-
It's also important to use _open formats_ (such as `CSV`, `JSON` or `PNG`), intead of _proprietary formats_ (tied to specific vendors, such as `PSD` or `RAR`).
18
+
It's also important to use _open formats_ (such as `CSV`, `JSON` or `PNG`), instead of _proprietary formats_ (tied to specific vendors, such as `PSD` or `RAR`).
18
19
19
20
20
21
## Level 1: unstructured data
@@ -51,15 +52,15 @@ If we want predictability, we need to make it _type-safe_.
51
52
52
53
## Level 3: type-safe data
53
54
54
-
_Examples: SQL + DB SCHEMA, JSON + JSON schema, XSD + XML, RDF + SHACL, In-memory data in type-safe programming langauges_
55
+
_Examples: SQL + DB SCHEMA, JSON + JSON schema, XSD + XML, RDF + SHACL, In-memory data in type-safe programming languages_
55
56
56
57
Type-safe data means that every value of the data has an explicit datatype.
57
58
It is _strongly typed_ and has a clear _schema_ that describes which properties you can expect in a Resource.
58
59
This means that someone re-using type-safe data can know for certain that it conforms to a specification, a set of rules.
59
60
The shape of the data is _predictable_.
60
61
This predictability means that developers can safely re-use it in their system without worrying about missing fields or datatype errors.
61
62
62
-
Lots of software has _internal_ type safety, especially if you use type-safe programming langauges like Typescript, Kotlin or Rust.
63
+
Lots of software has _internal_ type safety, especially if you use type-safe programming languages like Typescript, Kotlin or Rust.
63
64
However, when the data _leaves the system_, a lot of type related data is lost.
64
65
Even if this schema related information is described, the schema itself is often not machine-readable.
65
66
The best way to have type-safe data, is to describe the schema in a machine-readable format.
@@ -79,7 +80,7 @@ In Atomic Data, the Properties themselves (the links in the keys in JSON-AD) des
79
80
80
81
## Level 4: browsable data
81
82
82
-
_Examples: Atomic Data, propertly hosted RDF_
83
+
_Examples: Atomic Data, properly hosted RDF_
83
84
84
85
If your data is _connected_ to other pieces of machine-readable dat, is becomes browsable, similar to how websites link to each other.
85
86
This effectively creates a _web of data_, and allows for a whole new way to think about the internet.
0 commit comments