-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I just had a really nice brainstorm with @AlexMikhalev about importing atomic data #89. He basically said: instead of focussing on writing importers, make sure other projects can more easily export data.
Exporting Atomic Data is quite a big task for most projects, because they need to make sure that all their subjects resolve. That means implementing the accept header, making sure the routing matches...
But a large part of the advantages of Atomic Data are related to its schema. If a data source would not have resolvable URLs, but could still map properties to Atomic Properties, it would still be beneficial.
Design goals
- Easy way to publish data as atomic data
- Can be represented statically
- Ideally can be achieved in a templating language
- Simpler, yet more powerful than RSS
- Deal with versions, maybe?
Usecases
Let's consider some usecases:
Importing blogposts / sharing a feed
Some author hosts a blog. They don't want to implement the entire atomic data protocol, but they do want to add some simple mapping. They share a list of their blogposts, described as Atomic Data resources on a pre-defined URL.
Some reader sees this blog, and sees a subscribe (atomic) button. His atomic server creates a subscription for this URL. The server fetches the URL, parses the data. It creates copies of the articles on the server. It prevents deduplication
Conversion target for Atomizer #89 - convert some data source to atomic data, without hosting individual resources
We want to build an importer / transformer tool that converts various data sources to atomic data. The output of this conversion tool can be many things. I first thought: let's aim for Commits. However, that can be quite a dependency - the client needs to sign Commits, which means it needs a private key and signature logic, and it needs to send the data somewhere, and it needs to know what the URL is of the Server where the data will be stored. That's quite a hefty contract.
If that exporter could simply create one JSON file containing all resources, they would not have to implement the routing. They could simply create this one JSON file upon request.
The next system (e.g. atomic server) could then easily convert that data into fully hosted atomic data. The server could mint the URLs / subjects.
Compared to RSS / Atom feeds
- JSON over XML
- Extensible, not fixed to one document data model like RSS
- Type safe, because atomic data
Names suggestions
- Atom feeds (lol jk - we should avoid confusion as much as possible, even though there will probably always be some)
- Atomic Data Publishing Protocol - ADPub
- Atomic lists
- Atomic Data Feed (ADF)
- Atomic Simple Syndication (federation) format @AlexMikhalev
- Place federation format @AlexMikhalev
Challenges
- How do we correctly identify content (and prevent unintended data duplication) without requiring the server to fully implement atomic data content?
Implementation ideas
Add a new local-id property, require this in EADEP resources
- Server can host an EADEP resource somewhere, e.g.
https://example.com/blogs/eadep.jsonad - This resource has some metadata about the items hosted here, such as when it was updated
- The items (in this case, blogs) are nested resources without
@idfields, but they do have internal identifiers:local-id. These should not change over time. They do not need to resolve as URLs. They are scoped to the parent - not globally.
{
"@id": "https://example.com/blogs/adpub",
"https://atomicdata.dev/properties/updatedAt": 160179249,
"https://atomicdata.dev/properties/items": [
{
"https://atomicdata.dev/properties/sourceUrl": "https://example.com/blogs/someId123",
"https://atomicdata.dev/properties/description": "Hello this is my blog!"
}
]
}Some thoughts / doubts about this approach:
- What kind of properties should we recommend (or even require) for the bottom-level resource? Or should this be entirely free for all?
- Should we require presence of HTTP accept header =
application/ad+json?
Local Identifiers
- Should often (or always?) be deterministic, to prevent duplicate imports.
- URLs are still the best here, but if not available, choose some domain specific deterministic concept. E.g. for vcard we pick mobile phone nr, for files maybe the content hash.
- Do not need to resolve, contrary to
@id.
Versioning
- It would be useful if data creators could specify (optionally) if things have changed since a previous export. That would limit performance impact for clients, too.
- Both timestamp or vector clock would work good here
- Should we combine this with
localId, so we get one big string? Increases complexity of data suppliers, but prevents naming conflicts if there is nolocalIdpresent.
Global vs local ID?
- Either the data creator or the importer needs to be responsible for preventing naming conflicts.
- If the data creator is responsible, we could have malicious creators who try to overwrite data from external sources. But we can prevent this by checking sources in importers, of course.
to be continued