-
Notifications
You must be signed in to change notification settings - Fork 72
Node key prefixes in the database #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks interesting, out of curiosity, why is this needed ? ( |
|
Updated the description. |
|
@cheme Anything prevents this from been merged? I really want paritytech/substrate#1733 get fixed. |
|
@xlc, the code in itself seems functionally fine and could be merge, but it is currently unclear if this will be the actual fix for paritytech/substrate#1733, it would also require trie_root changes for full compatibility (probably requiring modifying 'TrieStream' trait). |
|
@cheme Thanks for the detailed explanation. Good to know this is not getting stale. |
cheme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I did not run tests yet (I could probably fuzz it a bit later but I need to update my trie_root alternative algo from #11), still I put a few first comments.
I am also starting to wonder : why not have a trie unique id and calculate full_key from concat of unique_id + prefix only : this would allow direct access to a value without going through all the trie. (the unique_id thing may not even be needed in case of a column containing only the trie).
Ok, I am stupid, this does not keep the history, it is only possible if managing history or using one unique trie id per block. Still allowing custom full_key scheme would be interesting (at least for parity-eth compatibility).
Last observation, this PR may reduce the possibilities of trie_root crate (some form of Stream could be use to build a trie by including a db, this should now requires some modification of Stream trait if I am correct).
memory-db/src/lib.rs
Outdated
|
|
||
|
|
||
| /// Make database key from hash and prefix. | ||
| pub fn full_key<H: KeyHasher>(key: &H::Out, prefix: &[u8]) -> Key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would certainly see something like :
pub fn full_key<H: KeyHasher>(key: &H::Out, prefix: &[u8], key_dest: &mut [u8]) {
and reuse a full key buffer in MemoryDB
Would also require a function returning size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing the buffer would only make sense in read-only methods, and that would make MemoryDB not thread safe for reading, or require additional synchronization.
memory-db/src/lib.rs
Outdated
| pub fn full_key<H: KeyHasher>(key: &H::Out, prefix: &[u8]) -> Key { | ||
| let mut full_key = Vec::with_capacity(key.as_ref().len() + prefix.len()); | ||
| full_key.extend_from_slice(prefix); | ||
| full_key.extend_from_slice(key.as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not mixing key with prefix anymore, but assume variable length for full_key: the parity_common kvdb crate would probably benefit from being able to choose the key type (currently ElasticArray32 : in our case ElasticArray64 or an intermediatory value would fit better).
Adding an issue to not forget that may be an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profiling shows that allocations here and calculating partial keys are insignificant compared to IO and other code. So I left it for later.
| /// Look up a given hash into the bytes that hash to it, returning None if the | ||
| /// hash is not known. | ||
| fn get(&self, key: &H::Out) -> Option<T>; | ||
| fn get(&self, key: &H::Out, prefix: &[u8]) -> Option<T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PlainDB could probably use that too (fwiu plaindb is for varlen key), @sorpaas would know better than I.
| // this loop iterates through non-inline nodes. | ||
| for depth in 0.. { | ||
| let node_data = match self.db.get(&hash) { | ||
| let node_data = match self.db.get(&hash, &key.encoded_leftmost(key_nibbles, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small slowdown for parity ethereum, I do not really see a way of avoiding it.
|
I did start updating another trie pr on friday using this pr changes: things seems to work fine :) (after solving few indexing issue of my own, I could fuzz my triebuilder against this prefixed implementation). Manipulating this new api makes me wonder about two points:
fn key(&self, hash: &H::Out, prefix: &[u8]) -> Self::Key; I could make keyspace indexing in the keyfunction implementation. But honestly, this is still doable by overloading the db if we consider this not being the role of the keyfunction trait.
|
|
Regarding paritytech/substrate#2035, this could be done purely in the
Not sure I understand this. Trie iteration is surely still possible. Node backend iteration or seek depends on |
|
About paritytech/substrate#2035 , the HashDB way of doing things is fine (still I found it more elegant with KeyFunction (I am currently realizing that I need to handle the empty node value of hash db case : with keyfunction it is native). But it is really a matter of design and does not have to make it to this PR. |
For some applications, such as substrate it is desirable to have each node to be unique in the trie. So that the same node can't be inserted into two separate branches of the same trie. This simplifies implementation of node storage quite a lot, since it removes the need for reference counting.
In ethereum the uniqueness is already guaranteed by the fact that the keys are hashes and each value ends up in a leaf node with a long random partial key. In Substrate this is not the case, as the keys are plain.
This PR introduces an additional parameter for
HashDBfunctions that takes encoded partial node key. This allows for separating colliding nodes on the database levels, and for more efficient database implementation. E.g. The nodes that are close to each other in the trie may be grouped on disk.Trie root calculation is not affected.