Implement change trie for child trie. #3122

cheme · 2019-07-15T16:24:45Z

This PR is an attempt to fix #2622.

It makes changes trie handle child trie content.

It makes a few choice, so it is only a proposal and things can be change:

a third change trie input node type: a child trie node containing the root of the child change trie.
An alternate way of doing things would be to directly put child changes content into the top change trie with another new node (index build over encoding of block, encoded child trie key, child trie content key).
those new change trie child nodes are not used in digest (it would be very straightforward to implement), digest nodes are still only build with former existing nodes.
child trie node change of root do not see its extrinsics registered as previously (the information being the sum of the change child trie extrinsics) : this can be restored from former code (a hashset keeping trace of change globally in overerlaychange), or build at the time child trie root is calculated.

It also will also need some changes after #2209, main question being do we refer to child trie keyspace or parent address, in this case parent address seems to be right addressing.

I also wonder if removing block number info from change trie keys could be a good idea?
cc/ @svyatonik
It would requires prefixing memorydb key with encoded block number, and pruning could be done by directly removing all keys starting with this block number prefix (no need for trie parsing).
Similarly change trie child content could be prefixed by a unique id such as its storage path to be able to isolate its key values without trie parsing (but I got no direct use case except for being able to export change child trie without trie parsing).

some useless computation (but there is a pending pr for that). Next are tests.

svyatonik

Re choosing the implementation:

I'm still not sure - why separate tries are better than one trie for both top + all children tries. Do you have an opinion on this?
if there are no strong requirements for using separate tries, I'd prefer to do some benches before merging this PR. Like - given the same set of changes how faster/slower is maintaining multiple tries vs big single trie.

Re implementation - everything looks OK, except for some small issues I've found. Will do a final review once PR will be ready. Thanks for doing this!

Re removing block number from trie keys - they were added exactly for that (I mean simplified pruning). If there's a faster way for doing the same, it would be great! Could you, please, file an issue?

core/network/src/protocol.rs

svyatonik · 2019-07-24T09:46:37Z

core/state-machine/src/changes_trie/build.rs

+
+					trie_storage.for_key_values_with_prefix(&child_prefix, |key, value| {
+						if let Some(InputKey::ChildIndex::<Number>(trie_key)) = Decode::decode(&mut &key[..]) {
+							if let Some(value) = <Vec<u8>>::decode(&mut &value[..]) {


Is it possible to use H::Out instead of Vec<u8>?

(to avoid panic in case if lengths are different)

svyatonik · 2019-07-24T10:22:26Z

core/state-machine/src/changes_trie/build.rs

+						block: block.clone(),
+						storage_key,
+					};
+					child_map.insert(child_index, map.into_iter().map(|(_, (k, v))| InputPair::DigestIndex(k, v)));


This will replace existing value in the child_map with new one, right? Like if digest is being built for blocks [1; 8] and child storage with key child1 has been updated in block#2 && block#6, then:

when processing block#2 we'll insert (child1, vec![2]) into child_map;

when processing block#6 we'll replace this value (with this .insert() call) with (child1, vec![6]), thus losing changes for block#2.

will need some intermediate map (test cases were wrong to).

core/state-machine/src/changes_trie/build.rs

svyatonik · 2019-07-24T10:45:48Z

core/state-machine/src/overlayed_changes.rs

 			for (key, _) in committed_map.iter() {
-				map_entry.1.insert(key.clone(), None);
+				if !map_entry.contains_key(key) {
+					map_entry.insert(key.clone(), OverlayedValue {


If value has been changed in previous extrinsics, then this .insert() call will replace existing entry, thus forgetting previous extrinsics indices.

cheme · 2019-07-24T18:08:17Z

I'm still not sure - why separate tries are better than one trie for both top + all children tries. Do you have an opinion on this?

Flattening child trie in the parent trie seems doable (putting optional child storage key in the input keys). This will lead to trie containing encoded child storage key as child trie prefix and instead of having a child trie root in a value node pointing to another child trie, we will get directly the child trie top.
So indead it will skip one hash per child trie and will certainly be more efficient for construction (plus code will be simpler I believe).

For what it is worth there may be some possible quicker access for multiple child trie content (the parent trie query/proof can be shared). But this can also be achieved with flattened child trie by basing the triedb on an middle branch node corresponding to the common encoded key bytes of all child trie nodes (plus possible partial for key).

In my opinion, what could justify the use of child trie is only wether we want to split change trie or not.

If we want to be able to revert some child trie to a previous state, we could use the former child trie to, but I do not think this would really make sense (pushing some special info in change trie seems more relevant).
So splitting change trie could only make sense if we want to move child trie state between chains, then the accompanying child trie part can be move, but here again it does not make much sense unless if we want to rebuild some child trie history (we need to keep header and a child trie root inclusion proof) with related change trie.

This only works cleanly if child trie get some prefix (similar to #2209) to fetch their key in the collection (currently no prefix are use on child trie).

if there are no strong requirements for using separate tries, I'd prefer to do some benches before merging this PR. Like - given the same set of changes how faster/slower is maintaining multiple tries vs big single

Hypothetical, the usage of child trie in digest may lead to skip some trie access (but the common child trie path would be cached in case of a big single trie). I don't have any idea if it can make for the added indirection level on creation.

svyatonik · 2019-07-25T06:13:09Z

OK - you must be talking mostly about #2832, right (I mean reverting changes trie to previous state)? So the only advantage is potential boost of a revert-to-block performance. But imo it doesn't make sense. Like if you are going to revert to block#500 when you're at the block#1000, then changes trie for block#500 isn't the changes trie you want to duplicate for block#1000. Changes trie contains state difference between current and previous blocks. Example:

at block#499 there's only one key in storage: [key1 => Some(value1)];
at block#500 another key is inserted (key2 => Some(value2)) => changes trie for block#500 will contain these keys: [key2];
at block#600 another key is inserted (key3 => Some(value3));
when at block#1000 you want to revert to state of block#500, then the changes trie for block#1000 must have keys: [key3], not the [key2] as changes trie for block#500. That's because only key3 has been changed in 500...1000, and key1 + key2 were staying the same.

Or have I misunderstood something? So imo we must build changes trie from the scratch as we normally do in the case of revert-to-block operation.

I'd also summon @gavofyork here - maybe he has a strong opinion on whether we need one-changes-trie-per-child-trie, or not.

cheme · 2019-07-25T06:50:58Z

About reverting, I ended up with the same conclusion as you (does not make sense for change trie).
When considering extracting a child trie globally from a chain, I did see a possible interesting thing (keeping change trie history, but for similar reason it may not be that useful).

cheme · 2019-07-25T10:42:51Z

@svyatonik , if you wish to bench a bit, I did a quick implementation of flattened child trie here:
master...cheme:ch-ch-trie2
(tested only over substrate-state-machine and with low code quality).

svyatonik · 2019-07-25T10:50:19Z

Ah, I've also started that :) Nvm - will use your version, thanks! :)

svyatonik · 2019-07-25T12:46:30Z

Okay, I've got some bench results. And actually this implementation works faster than implementation from ch-ch-trie2. Here's the test that I've used and results. Measured time is execution time of build_changes_trie() call, though I've also measured total time of trie.insert()-s and again it is better for this PR. Since performance has been my only concern, I think better to stick with separate changes tries, as you've suggested. Thanks for your help!

cheme · 2019-07-25T12:52:49Z

That is not really what I expected, it comes probably from the trie reading being split. I would say trie.insert can be improve a bit by using algo such as 'iter_build' in paritytech/trie#11 , but for build_changes_trie it would change nothing.

svyatonik · 2019-08-29T13:06:44Z

@cheme Is this still "A3-inprogress"? :)

cheme · 2019-08-29T13:09:25Z

Oh, I forgot to switch the tag, thanks.

svyatonik

Looks good (aside of comment removal). I'll take a final look tomorrow - mostly worrying abut CT build, since it is consensus-critical part.

svyatonik · 2019-08-29T13:15:53Z

core/state-machine/src/changes_trie/mod.rs

 // You should have received a copy of the GNU General Public License
 // along with Substrate.  If not, see <http://www.gnu.org/licenses/>.

-//! Changes trie related structures and functions.


Seems like these docs should stay?

svyatonik

One last issue, otherwise looks good.

svyatonik · 2019-08-30T05:40:01Z

core/state-machine/src/changes_trie/build.rs

+						changes.storage(k)
+					};
+					if !existing.map(|v| v.is_some()).unwrap_or_default() {
 						if !backend.exists_storage(k).map_err(|e| format!("{}", e))? {


If storage_key.is_some(), there should be a call to backend.exists_child_storage()

pretty bad one. I remember when looking at this part of code that this query should be expensive, I guess there may be something doable to have the info in the transaction without querying the backend (would probably require additional query of backend during execution in some case).

We have had this discussion in #2865 - it was an (unsuccessful) attempt to extend this check to non-temporary values. IMO that's what state cache actually handles - if value has been read from trie during execution, then it'll be read from in-memory cache for the second time. And if it wasn't, then there's no other way, than to perform this read.

I also guess that running this backend check during block execution, in case of invalid block, is also an unnecessary operation.
Maybe if overlay was aware of accessed data, it could conditionally store this in the transaction, but that would be similar as putting some access cache in the overlayed_change (which may be interesting to avoid some map access (we already need to check change for get_storage) but would be some substantial design change).

cheme added 7 commits July 12, 2019 21:34

Initial implementation, some redundancy is awkward and there is

2c06d30

some useless computation (but there is a pending pr for that). Next are tests.

Minimal tests and fix extend child.

c93875d

implement iterator for change child trie.

a5483cb

prune child trie.

b9b532f

Fix pruning test.

4bd0fbc

Merge branch 'master' into ch-ch-trie

c27dda2

bump spec version.

ed8cc0d

cheme added the A3-in_progress Pull request is in progress. No review needed at this stage. label Jul 15, 2019

Merge branch 'master' into ch-ch-trie

ccc76a4

svyatonik self-requested a review July 23, 2019 07:43

Avoid empty child trie (could also be checked before)

cd0abc6

svyatonik reviewed Jul 24, 2019

View reviewed changes

cheme added 2 commits July 24, 2019 17:53

tabs.

58eeee3

Fix child digest overriding each others.

c00ca20

cheme added 2 commits August 8, 2019 11:00

Merge branch 'master' into ch-ch-trie

baccaed

Merge branch 'master' into ch-ch-trie

c24f418

cheme requested a review from tomusdrw as a code owner August 29, 2019 10:43

Merge branch 'master' into ch-ch-trie

24d3b85

cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 29, 2019

svyatonik reviewed Aug 29, 2019

View reviewed changes

Restore doc deleted on merge.

bebab64

svyatonik reviewed Aug 30, 2019

View reviewed changes

Check correct child value on extrinsics build.

90e1b8f

svyatonik approved these changes Aug 30, 2019

View reviewed changes

svyatonik added A8-looksgood and removed A0-please_review Pull request needs code review. labels Aug 30, 2019

gavofyork added A7-looksgoodcantmerge and removed A8-looksgood labels Sep 1, 2019

cheme added 2 commits September 2, 2019 09:29

Merge branch 'master' into ch-ch-trie

54383a9

Revert runtime version update.

92240c9

cheme added A8-looksgood and removed A7-looksgoodcantmerge labels Sep 2, 2019

svyatonik merged commit 7276eea into paritytech:master Sep 2, 2019

Implement change trie for child trie. #3122

Implement change trie for child trie. #3122

Uh oh!

Conversation

cheme commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svyatonik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cheme commented Jul 24, 2019

Uh oh!

svyatonik commented Jul 25, 2019

Uh oh!

cheme commented Jul 25, 2019

Uh oh!

cheme commented Jul 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svyatonik commented Jul 25, 2019

Uh oh!

svyatonik commented Jul 25, 2019

Uh oh!

cheme commented Jul 25, 2019

Uh oh!

svyatonik commented Aug 29, 2019

Uh oh!

cheme commented Aug 29, 2019

Uh oh!

svyatonik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

svyatonik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cheme commented Jul 15, 2019 •

edited

Loading

cheme commented Jul 25, 2019 •

edited

Loading