Skip to content

Conversation

@alexggh
Copy link
Contributor

@alexggh alexggh commented Feb 24, 2025

Part of #6131 (comment), we need to make sure that the TrieCache is able to work with the assumption that more or less the entire current state fits into memory, this can be split into a few parts:

  • Remove the LocalTrieCache hard coded size and make it configurable from outside.
  • Decided to have two flavours of building the LocalTrieCache, for the import and block authoring we are using a trusted local cache that grow to hold everything in the block and then propagated everything into the shared trie cache.
  • [x] Everything from LocalTrieCache needs to be propagated back to the SharedTrieCache in a fast and safe manner, so that it doesn't become a bottleneck. Currently, that is done on drop, with the function holding the ShareTrieCache write lock while promoting all keys. Decided to move this on a separate worker thread, so that the hot-path does not have to wait for drop function to propagate everything from the LocalTrieCache to the SharedTrieCache, the flushing of the worker thread happens after the block production and import happens. Update this is not need because authoring and imports is bounded, so the numbers is not exaggeratedly big, to make the drop too heavy, see numbers here Make SharedTrieCache/LocalTrieCache work with entire state in memory  #7682 (comment).

Todo

  • Stats to prove most of reads/writes hit the shared or the local trie cache.
  • Unit testing the new changes.

Fixes: #7534

alexggh and others added 10 commits February 24, 2025 10:30
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh alexggh marked this pull request as ready for review March 13, 2025 10:37
@alexggh alexggh changed the title [WIP] Make SharedTrieCache/LocalTrieCache work with entire state in memory Make SharedTrieCache/LocalTrieCache work with entire state in memory Mar 13, 2025
Copy link
Contributor

@AndreiEres AndreiEres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we measure the changes?

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, mainly left some comments on smaller changes. One of the main things I don't want to see is that we expose this via CLI nor via a config option. I don't see why we should do this.

alexggh and others added 6 commits April 17, 2025 12:57
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/14513792430
Failed job name: fmt

alexggh added 3 commits April 17, 2025 14:34
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh alexggh added the T0-node This PR/Issue is related to the topic “node”. label Apr 17, 2025
Copy link
Contributor

@AndreiEres AndreiEres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@alexggh
Copy link
Contributor Author

alexggh commented Apr 30, 2025

Changes are running on kusama-parachain-asset-hub-collator-node-0 and kusama-parachain-asset-hub-collator-node-1 2025-04-28, dashboards for the metrics are here: https://grafana.teleport.parity.io/goto/PLTrT7bNg?orgId=1, I will let it run for a few more days and if no problems arise will merge it.

@alexggh
Copy link
Contributor Author

alexggh commented May 14, 2025

Changes are running on kusama-parachain-asset-hub-collator-node-0 and kusama-parachain-asset-hub-collator-node-1 2025-04-28, dashboards for the metrics are here: https://grafana.teleport.parity.io/goto/PLTrT7bNg?orgId=1, I will let it run for a few more days and if no problems arise will merge it.

There weren't any problem while running on asset-hub-collators, merging this now.

@alexggh alexggh added this pull request to the merge queue May 14, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 14, 2025
@alexggh alexggh added this pull request to the merge queue May 14, 2025
Merged via the queue into master with commit 9991b1a May 14, 2025
248 of 249 checks passed
@alexggh alexggh deleted the alexggh/local_cache_unlimited branch May 14, 2025 15:37
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
…7682)

Part of
#6131 (comment),
we need to make sure that the TrieCache is able to work with the
assumption that more or less the entire current state fits into memory,
this can be split into a few parts:

- [x] Remove the LocalTrieCache hard coded size and make it configurable
from outside.
- [x] Decided to have two flavours of building the LocalTrieCache, for
the import and block authoring we are using a trusted local cache that
grow to hold everything in the block and then propagated everything into
the shared trie cache.
- ~[x] Everything from LocalTrieCache needs to be propagated back to the
SharedTrieCache in a fast and safe manner, so that it doesn't become a
bottleneck. Currently, that is done on drop, with the function holding
the ShareTrieCache write lock while promoting all keys. Decided to move
this on a separate worker thread, so that the hot-path does not have to
wait for drop function to propagate everything from the LocalTrieCache
to the SharedTrieCache, the flushing of the worker thread happens after
the block production and import happens.~ Update this is not need
because authoring and imports is bounded, so the numbers is not
exaggeratedly big, to make the drop too heavy, see numbers here
#7682 (comment).

# Todo
- [x] Stats to prove most of reads/writes hit the shared or the local
trie cache.
- [x] Unit testing  the new changes.

Fixes: #7534

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make SharedTrieCache/LocalTrieCache work with entire state in memory

5 participants