-
Notifications
You must be signed in to change notification settings - Fork 6
Transient storages #11
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,336 @@ | ||||||
| --- | ||||||
| Title: transient storage host functions | ||||||
| Number: 0 | ||||||
| Status: Proposed | ||||||
| Authors: | ||||||
| - cheme | ||||||
| Created: 2023-04-xx | ||||||
| Category: Runtime Environment | ||||||
| Requires: | ||||||
| Replaces: | ||||||
| --- | ||||||
|
|
||||||
|
|
||||||
| Current Implementation: https://github.com/cheme/substrate/tree/transient | ||||||
|
|
||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| This PPPs defines new host functions to keep trace of transient data accross a block processing. | ||||||
|
|
||||||
| It is derived and modified from initial writing https://github.com/paritytech/substrate/issues/12577, credits to Gavin Wood. | ||||||
|
|
||||||
| It exposes two new sets of host functions, blob storage an api around bytes storage (similar to rust `Vec<u8>`), and ordered key values storage (similar to rust `BTreeMap<Vec<u8>, Vec<u8>>` storage). | ||||||
|
|
||||||
| Each storage structure are defined and address by a name with no limit in number except the block weight. | ||||||
| Each structure can be hashed or reduced to a single merkle root to possible store it on state. | ||||||
| Each structure can define if it should be send to client storage. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| From https://github.com/paritytech/substrate/issues/5396: | ||||||
| "Right now we abuse storage for intra-block data such as block number, parent hash and block author as well as various housekeeping information and flags like whether we set the uncles Authorship::DidSetUncles. | ||||||
|
|
||||||
| When initially writing, this incurs an extra trie lookup, which is slow. Instead there should be another host API, which works exactly like set_storage/get_storage but has no trie backing, so it never tries to lookup the value in the trie, nor does it write the value at the end of the block." Credits Gavin Wood. | ||||||
|
|
||||||
| Additionally content such as events often are used in a log based manner (append only) with possibly a larger size than usual content. | ||||||
|
|
||||||
| Hashing through host function involves passing all data at once, and is not memory efficient. | ||||||
|
|
||||||
| ## Implementation | ||||||
|
|
||||||
| Transient storage act as current state storage, but without a persistent backend. | ||||||
|
|
||||||
| This implies that the storage must support commiting and reverting transaction with `ext_storage_commit_transaction` or `ext_storage_rollback_transaction`. | ||||||
| This transactional support is both at transient storage content and at transient storage definition (a delete transient storage will be restore on rollback). | ||||||
|
|
||||||
| Btree and blob are using a specific `Mode`, either `drop` or `archive` passed respectively as the byte 0 or 1. When using `drop` the data will not be send from the runtime executor to the calling client. When using `archive` the committed state of the transient storage will be passed as a change set to the client calling runtime executor. | ||||||
|
||||||
|
|
||||||
| In archive mode it is the client that choose its strategy for storing the block transient storages final states. | ||||||
|
|
||||||
|
|
||||||
| ### Implementation of Btree storage | ||||||
|
|
||||||
| - `ext_btree_storage_new` with parameters: | ||||||
|
||||||
| - name : a pointer size to the name of the new transient storage. | ||||||
| - mode : `Mode` as an u8 (either 0 `drop` or 1 `archive). | ||||||
| No result. | ||||||
| Allows using a transient storage for a given `name` and `mode`. | ||||||
| If a transient storage already exists with the same `name`, it is overwritten. | ||||||
|
|
||||||
| - `ext_btree_storage_exists` with parameters: | ||||||
| - name : a pointer size to the name of a transient storage. | ||||||
| Result is a boolean indicating if transient storage was instantiated. | ||||||
|
|
||||||
| - `ext_btree_storage_delete` with parameters: | ||||||
| - name : a pointer size to the name of a transient storage. | ||||||
| Result true if a transient storage did exist and was removed, and false if no | ||||||
| transient storage did exist. | ||||||
|
|
||||||
|
|
||||||
| - `ext_btree_storage_clone` with parameters: | ||||||
| - name : a pointer size to the name of a transient storage to clone. | ||||||
| - target_name : a pointer size to the new transient storage to use. | ||||||
| Result is a true if operation succeed and false if there was no storage at `name`. | ||||||
| Clone keep same `Mode`. Clone copy all content from a storage to another storage. | ||||||
| If a transient storage is present at `target_name` it is overwritten. | ||||||
|
|
||||||
| This operation cost is high, the implementation do not try to avoid copy. | ||||||
|
|
||||||
| - `ext_btree_storage_rename` with parameters: | ||||||
|
||||||
| - name : a pointer size to the name of a transient storage to rename. | ||||||
| - target_name : the new name to use. | ||||||
| Result is a true if operation succeed and false if there was no storage at `name`. | ||||||
|
|
||||||
| Renaming iternally rename the storage. As `name` is the main way to address a storage, | ||||||
| it is very likelly to be implemented as a move in an indexing structure. | ||||||
| As all this need to be transactional and revertable. | ||||||
|
|
||||||
| If a transient storage is present at `target_name` it is overwritten. | ||||||
|
|
||||||
| This operation cost is small, there should be no copy of storage content. | ||||||
|
|
||||||
|
|
||||||
| - `ext_btree_storage_insert_item` with parameters: | ||||||
| - name : a pointer size to the name of a transient storage to rename. | ||||||
| - key : a pointer size to the key of the content to insert. | ||||||
| - value : a pointer size to the value of the content to insert. | ||||||
| Returns false if there is no btree storage defined for this `name`, true otherwhise (success). | ||||||
|
|
||||||
| This insert a new key value content. | ||||||
|
||||||
| This insert a new key value content. | |
| This insert a new key value content. Does nothing if the btree storage of this `name` doesn't exist. |
Needs to be explicited.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are Root32Structure and SubstrateDefault?
I shouldn't have to read the Substrate code in order to understand the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind "Root32Structure" is to allow multiple way to calculate a merkle root from key value content.
"SubstrateDefault" is just using the merkle trie V1 that we currently have in state.
But this is not a great merkle structure, clearly a binary trie will be way better (smaller proof), but this is straight forward implementation as a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Returns a scale encoded optional sized array of keys (rust `Option<Vec<Vec<u8>>>`). | |
| Returns a scale encoded optional sized array of keys (rust `Option<Vec<Vec<u8>>>`). Returns a SCALE-encoded `None` either if there is no transient btree with that name or if the `key` is the last key of that transient btree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intention was more to:
Returns a SCALE-encoded Some(vec![]) if the key is the last key of that transient storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this sentence, it seems that one alternative could be to provide new host functions that allow hashing data progressively, which seems tremendously more simple than all this machinery.
If this alternative is not viable, this section really should explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I propose in the paragraph "Implementation of Blob storage hashing". This part is not really strictly needed for transient storage, but it did not make sense to me to have blob (potentially big) hashing with the current api.
Maybe I should extract it in a separate PPP (progressive hashing host function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that the "Motivation" and/or "Alternatives" section should explain what problem is being solved, but also why this specific design was chosen and not a different one. The objective is to make sure that this isn't an XY problem.
If the problem that is being solved is just that "hashing through host function isn't memory efficient", then the solution in this RFC is way overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more another issue that I tackled when implementing and force push in this PPP, it is a bit confusing to have all in it.
Similarily the ordered map solve a precise issue, and blob a slightly different one.
Clearly would make sense to me to have the transient ordered map in a first time and blob in a second time: could be two different PPPs (even if this helps factoring).
I think I will extract the hashing part as a first step, not sure about splitting between ordered map and blob.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But again, my point is that it should be written in the PPP which issue is being solved.
It's not possible to have an opinion on whether a proposal is good if you don't know which use-cases it has and problems it solves.