Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@cheme
Copy link
Contributor

@cheme cheme commented Jul 15, 2019

This PR is an attempt to fix #2622.

It makes changes trie handle child trie content.

It makes a few choice, so it is only a proposal and things can be change:

  • a third change trie input node type: a child trie node containing the root of the child change trie.
    An alternate way of doing things would be to directly put child changes content into the top change trie with another new node (index build over encoding of block, encoded child trie key, child trie content key).
  • those new change trie child nodes are not used in digest (it would be very straightforward to implement), digest nodes are still only build with former existing nodes.
  • child trie node change of root do not see its extrinsics registered as previously (the information being the sum of the change child trie extrinsics) : this can be restored from former code (a hashset keeping trace of change globally in overerlaychange), or build at the time child trie root is calculated.

It also will also need some changes after #2209, main question being do we refer to child trie keyspace or parent address, in this case parent address seems to be right addressing.

I also wonder if removing block number info from change trie keys could be a good idea?
cc/ @svyatonik
It would requires prefixing memorydb key with encoded block number, and pruning could be done by directly removing all keys starting with this block number prefix (no need for trie parsing).
Similarly change trie child content could be prefixed by a unique id such as its storage path to be able to isolate its key values without trie parsing (but I got no direct use case except for being able to export change child trie without trie parsing).

@cheme cheme added the A3-in_progress Pull request is in progress. No review needed at this stage. label Jul 15, 2019
@svyatonik svyatonik self-requested a review July 23, 2019 07:43
Copy link
Contributor

@svyatonik svyatonik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re choosing the implementation:

  1. I'm still not sure - why separate tries are better than one trie for both top + all children tries. Do you have an opinion on this?
  2. if there are no strong requirements for using separate tries, I'd prefer to do some benches before merging this PR. Like - given the same set of changes how faster/slower is maintaining multiple tries vs big single trie.

Re implementation - everything looks OK, except for some small issues I've found. Will do a final review once PR will be ready. Thanks for doing this!

Re removing block number from trie keys - they were added exactly for that (I mean simplified pruning). If there's a faster way for doing the same, it would be great! Could you, please, file an issue?


trie_storage.for_key_values_with_prefix(&child_prefix, |key, value| {
if let Some(InputKey::ChildIndex::<Number>(trie_key)) = Decode::decode(&mut &key[..]) {
if let Some(value) = <Vec<u8>>::decode(&mut &value[..]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use H::Out instead of Vec<u8>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to avoid panic in case if lengths are different)

block: block.clone(),
storage_key,
};
child_map.insert(child_index, map.into_iter().map(|(_, (k, v))| InputPair::DigestIndex(k, v)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will replace existing value in the child_map with new one, right? Like if digest is being built for blocks [1; 8] and child storage with key child1 has been updated in block#2 && block#6, then:

  • when processing block#2 we'll insert (child1, vec![2]) into child_map;
  • when processing block#6 we'll replace this value (with this .insert() call) with (child1, vec![6]), thus losing changes for block#2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need some intermediate map (test cases were wrong to).

for (key, _) in committed_map.iter() {
map_entry.1.insert(key.clone(), None);
if !map_entry.contains_key(key) {
map_entry.insert(key.clone(), OverlayedValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If value has been changed in previous extrinsics, then this .insert() call will replace existing entry, thus forgetting previous extrinsics indices.

@cheme
Copy link
Contributor Author

cheme commented Jul 24, 2019

  1. I'm still not sure - why separate tries are better than one trie for both top + all children tries. Do you have an opinion on this?

Flattening child trie in the parent trie seems doable (putting optional child storage key in the input keys). This will lead to trie containing encoded child storage key as child trie prefix and instead of having a child trie root in a value node pointing to another child trie, we will get directly the child trie top.
So indead it will skip one hash per child trie and will certainly be more efficient for construction (plus code will be simpler I believe).

For what it is worth there may be some possible quicker access for multiple child trie content (the parent trie query/proof can be shared). But this can also be achieved with flattened child trie by basing the triedb on an middle branch node corresponding to the common encoded key bytes of all child trie nodes (plus possible partial for key).

In my opinion, what could justify the use of child trie is only wether we want to split change trie or not.

If we want to be able to revert some child trie to a previous state, we could use the former child trie to, but I do not think this would really make sense (pushing some special info in change trie seems more relevant).
So splitting change trie could only make sense if we want to move child trie state between chains, then the accompanying child trie part can be move, but here again it does not make much sense unless if we want to rebuild some child trie history (we need to keep header and a child trie root inclusion proof) with related change trie.

This only works cleanly if child trie get some prefix (similar to #2209) to fetch their key in the collection (currently no prefix are use on child trie).

  1. if there are no strong requirements for using separate tries, I'd prefer to do some benches before merging this PR. Like - given the same set of changes how faster/slower is maintaining multiple tries vs big single

Hypothetical, the usage of child trie in digest may lead to skip some trie access (but the common child trie path would be cached in case of a big single trie). I don't have any idea if it can make for the added indirection level on creation.

@svyatonik
Copy link
Contributor

OK - you must be talking mostly about #2832, right (I mean reverting changes trie to previous state)? So the only advantage is potential boost of a revert-to-block performance. But imo it doesn't make sense. Like if you are going to revert to block#500 when you're at the block#1000, then changes trie for block#500 isn't the changes trie you want to duplicate for block#1000. Changes trie contains state difference between current and previous blocks. Example:

  • at block#499 there's only one key in storage: [key1 => Some(value1)];
  • at block#500 another key is inserted (key2 => Some(value2)) => changes trie for block#500 will contain these keys: [key2];
  • at block#600 another key is inserted (key3 => Some(value3));
  • when at block#1000 you want to revert to state of block#500, then the changes trie for block#1000 must have keys: [key3], not the [key2] as changes trie for block#500. That's because only key3 has been changed in 500...1000, and key1 + key2 were staying the same.

Or have I misunderstood something? So imo we must build changes trie from the scratch as we normally do in the case of revert-to-block operation.

I'd also summon @gavofyork here - maybe he has a strong opinion on whether we need one-changes-trie-per-child-trie, or not.

@cheme
Copy link
Contributor Author

cheme commented Jul 25, 2019

About reverting, I ended up with the same conclusion as you (does not make sense for change trie).
When considering extracting a child trie globally from a chain, I did see a possible interesting thing (keeping change trie history, but for similar reason it may not be that useful).

@cheme
Copy link
Contributor Author

cheme commented Jul 25, 2019

@svyatonik , if you wish to bench a bit, I did a quick implementation of flattened child trie here:
master...cheme:ch-ch-trie2
(tested only over substrate-state-machine and with low code quality).

@svyatonik
Copy link
Contributor

Ah, I've also started that :) Nvm - will use your version, thanks! :)

@svyatonik
Copy link
Contributor

Okay, I've got some bench results. And actually this implementation works faster than implementation from ch-ch-trie2. Here's the test that I've used and results. Measured time is execution time of build_changes_trie() call, though I've also measured total time of trie.insert()-s and again it is better for this PR. Since performance has been my only concern, I think better to stick with separate changes tries, as you've suggested. Thanks for your help!

@cheme
Copy link
Contributor Author

cheme commented Jul 25, 2019

That is not really what I expected, it comes probably from the trie reading being split. I would say trie.insert can be improve a bit by using algo such as 'iter_build' in paritytech/trie#11 , but for build_changes_trie it would change nothing.

@cheme cheme requested a review from tomusdrw as a code owner August 29, 2019 10:43
@svyatonik
Copy link
Contributor

@cheme Is this still "A3-inprogress"? :)

@cheme cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 29, 2019
@cheme
Copy link
Contributor Author

cheme commented Aug 29, 2019

Oh, I forgot to switch the tag, thanks.

Copy link
Contributor

@svyatonik svyatonik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (aside of comment removal). I'll take a final look tomorrow - mostly worrying abut CT build, since it is consensus-critical part.

// You should have received a copy of the GNU General Public License
// along with Substrate. If not, see <http://www.gnu.org/licenses/>.

//! Changes trie related structures and functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these docs should stay?

Copy link
Contributor

@svyatonik svyatonik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last issue, otherwise looks good.

changes.storage(k)
};
if !existing.map(|v| v.is_some()).unwrap_or_default() {
if !backend.exists_storage(k).map_err(|e| format!("{}", e))? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If storage_key.is_some(), there should be a call to backend.exists_child_storage()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty bad one. I remember when looking at this part of code that this query should be expensive, I guess there may be something doable to have the info in the transaction without querying the backend (would probably require additional query of backend during execution in some case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have had this discussion in #2865 - it was an (unsuccessful) attempt to extend this check to non-temporary values. IMO that's what state cache actually handles - if value has been read from trie during execution, then it'll be read from in-memory cache for the second time. And if it wasn't, then there's no other way, than to perform this read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also guess that running this backend check during block execution, in case of invalid block, is also an unnecessary operation.
Maybe if overlay was aware of accessed data, it could conditionally store this in the transaction, but that would be similar as putting some access cache in the overlayed_change (which may be interesting to avoid some map access (we already need to check change for get_storage) but would be some substantial design change).

@svyatonik svyatonik added A8-looksgood and removed A0-please_review Pull request needs code review. labels Aug 30, 2019
@svyatonik svyatonik merged commit 7276eea into paritytech:master Sep 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle child trie key in change trie

3 participants