Skip to content

Conversation

@cheme
Copy link
Contributor

@cheme cheme commented Feb 19, 2019

This PR implement new codec spec from polkadot-re-spec:

The main change is the removal of trie extension node. This is triggered by switching an associated constant in TrieLayout.
Other changes are:

  • nibbleslice changes
    I did rework nibbleslice a bit: replacing encode by left and right.
    basically encode was copying and aligning bytes, this is mostly starting from
    the spec description.
  • left of a nibble offset is call Prefix and is encoded with padding at the end when
    number of nibble does not fit
    this is used for prefix for key db : this is a breaking change but it got the nice property to allow iteration on key in db (for contract child trie for instance where we only got key of contract or in case kvdb evolves to allow collection specifics).
    Also note that this change in prefix also include the number of padding (needed to iterate correctly).
  • right of a nibble is call Partial this is similar to existing naming of variable and
    is what is encoded into trie nodes: the padding is at start.
    This allow leaf to avoid shifting bytes for its partial. That is not the case for branch.
    I believe leaf is more prone to use long partial (especially if key is resulting from a hash).
  • to avoid using a vec the types uses a slice and a optional last bytes (when padding).
    Actually the optional last bytes is (u8,u8) to allow other layout.

Note that some of the code could revert to slower usage of at function to be easier to code check.

It also solve the existing TODO about removing header from encoding (see code in
ethereum branch where a code impact is visible (in substrate it is all fine)).

  • TrieLayout

An enveloppe trait over the trie layout to keep all implementation details at one place.
It also use experimental (few test) alternate trie layout (other radix only for aligned key and in a single byte
which limit to 4 variant).

@cheme cheme added the question Further information is requested label Feb 19, 2019
@cheme
Copy link
Contributor Author

cheme commented Feb 26, 2019

Note about NodeHeader scheme : compacting could look like:

// branch = value - 1 in extension size -> val 85 branch 84
const EMPTY_TRIE: u8 = 0;
const LEAF_NODE_OFFSET: u8 = 1; // (1 to 86) = max nbl l 85
const BRANCH_NODE_NODE_OFFSET: u8 = 87; // (87 to 171) max 84
const BRANCH_NODE_WITH_VALUE_OFFSET: u8 = 172; // (172 to 255) max 84

Problem, some scheme in substrate (double storage map) requires length > 32 bytes (from current code there is a 8 bit prefix + 128 bit twox hash + 256 bit blake so a nibble length of 96 (more than the 85 allowed above)).
So the one byte length encoding does not fit. So it feels like it definitely should allow encoding on two bytes for big val.

cheme added 24 commits March 21, 2019 21:42
are currently broken, expect lot of non working things to.
And an additional malleability safe guard.
Tests involving `Lookup` are currently broken.
specific constant that will be use for merging code of NoExt with
previous one.
Furthermore could be use in multitrie implementation (no constant use
but fn ).
Currently only done for Mut (non mut will wait to see if merging goes
well).
@gavofyork
Copy link
Member

@cheme when you're certain that both all the reported issues, and the typical issues (bad naming, missing docs, low-quality docs) are fixed, please ping @arkpar and @svyatonik to get a final review.

@cheme
Copy link
Contributor Author

cheme commented Jul 29, 2019

@arkpar, @svyatonik I went other bad naming and try to go over the doc again this morning.

There is still things I can do to reduce code review size:

  • remove fuzzing and bench.
  • remove iter_build (extensively use by fuzzing but not much at other place: I may even put it in its own crate (there may be some codec thing that will need to go public)).
  • remove other radix variant, this can simplify nibble logic a bit (main idea of this was to make things more clear by generalizing but when I read back the code I am not sure its).
  • remove one of the two encoding variant from test-support.

@gavofyork
Copy link
Member

@cheme please remove as much as you can to keep the review size down

@gavofyork
Copy link
Member

@arkpar @svyatonik would be great if we could get this reviewed again

And replacing ReferenceError by CodecError.
@arkpar
Copy link
Member

arkpar commented Jul 31, 2019

Fuzzing and benching may be left as is.
Generic nibble size/tree arity sounds nice, but in practice also introduces additional overhead. E.g. the need to store extra byte per key prefix. I'd remove it for now.

Using 'GenericNoExtensionLayout' for 'NoExtensionLayout'.
@gavofyork
Copy link
Member

@cheme couple of questions there...

@arkpar @svyatonik are you guys ok with the PR otherwise?

pub fn biggest_depth(v1: &[u8], v2: &[u8]) -> usize {
// sorted assertion preventing out of bound
for a in 0..v1.len() {
if v1[a] == v2[a] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that always correct that v1.len() <= v2.len() (otherwise it'll panic)? I have found the only usage of the biggest_depth() from the trie_visit(). I assume that input there is ordered in some manner? If it sorted using BTreeMap, then given that e.g. vec![5, ,5] < vec![20], this could panic, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact there is no verification of the ordering, iter_build code does not support unordered input.
I can prevent this failing, but I will need to propagate the error, that seems pretty quick to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I mean that even if input is ordered (lexicographically, I suppose), then this could panic, because lexicographically vec of larger len could came before vec of smaller len. No need to propagate error imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes (I just realized, I just got to min the loop bound).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think you also need to update this line: return v1.len() * NIBBLE_PER_BYTE; below to use cmp::min() instead of v1.len())

@arkpar
Copy link
Member

arkpar commented Aug 1, 2019

I've verified that the implementation matches the spec and there's sufficient test coverage. So "looksgood" from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants