-
Notifications
You must be signed in to change notification settings - Fork 890
Optimize staked_nodes() for 3-4x performance improvement #8516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize staked_nodes() for 3-4x performance improvement #8516
Conversation
f8e5f71 to
89499d9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8516 +/- ##
=======================================
Coverage 83.1% 83.2%
=======================================
Files 846 846
Lines 368573 368576 +3
=======================================
+ Hits 306652 306709 +57
+ Misses 61921 61867 -54 🚀 New features to boost your workflow:
|
vadorovsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! I think there is a chance to use the pubkey hasher.
vote/src/vote_account.rs
Outdated
| ) | ||
| // Pre-allocate HashMap with estimated capacity to reduce reallocations | ||
| let mut staked_nodes = | ||
| HashMap::with_capacity(self.vote_accounts.len().saturating_div(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a hash map of validators, I think we could use PubkeyHasherBuilder instead of the default hasher:
| HashMap::with_capacity(self.vote_accounts.len().saturating_div(2)); | |
| HashMap::with_capacity_and_hasher(self.vote_accounts.len().saturating_div(2), PubkeyHasherBuilder::default()); |
That should speed it up even more. 🙂 You'll need to add the PubkeyHasherBuilder generic to the return type in the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PubkeyHasherBuilder is behind "rand" feature. vote package doesn't enable rand feature.
What's your experience on enable rand feature? Would it be a more broader change and require more testing to ensure no regression?
How about doing it as a separate optimization in a follow-up pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, we can do it separately.
Regarding adding the rand feature - in this PR #7307 I added the rand feature to solana-pubkey in solana-accounts-db. The only randomness that comes with it is randomization of 8 byte subslice of the pubkey to be used as a hash map key:
https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/lib.rs#L13-L14
https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/hasher.rs#L57-L81
There is no other module being pulled by this feature. So I don't think there should be any unexpected impact on the validator.
Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation
89499d9 to
71e573f
Compare
Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory
brooksprumo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
|
I updated the benchmark with realistic mainnet data (6700 vote accounts, 970 with stakes) Benchmark result
Pre-scan is 10% slower than div(2) estimate. But it prevent us from wasting more memory as the zero stake vote account grow. |
6e9e91b to
8c566a0
Compare
|
@brooksprumo I update the benchmark and this dismiss your approval. Can you re-approve it? No actual prod code change. |
|
I'm not sure the new benchmarks should be added. They are comparing different implementations, which is valuable for this PR and choosing one, but beyond this PR I don't see the value. I would think we only want benches in the repo for our current code. We could put the benchmarks for comparing impls as text/source in this PR, which would make it useful if we want to revisit in the future. |
8c566a0 to
29ecdc7
Compare
OK. Reverted the bench commit. The benchmark result is already in the PR comments. If we want to revisit, we can look at the PR. |
brooksprumo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
If we need to make future changes, I think we should rename the non_zero_ stuff to staked_.
* Optimize staked_nodes() by replacing itertools with manual HashMap Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation * pr feedback * Simplify stake aggregation logic per PR feedback * Use exact non-zero count for HashMap capacity Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory
Problem
The
staked_nodes()method inVoteAccountswas using itertools'into_grouping_map().aggregate()pattern, which creates intermediate allocations and adds unnecessary overhead for a hot path in validator operations.Summary of Changes
Replace the itertools grouping_map implementation with direct HashMap construction for better performance:
itertools::Itertoolsimportentry().and_modify().or_insert()Performance Impact
Benchmark with realistic scenario (400 validator nodes):
Running on mainnet shows 3-4x Speedup
log