rework hash calculation to not keep slot and write version #17685

jeffwashington · 2021-06-02T18:44:57Z

Problem

It is constant time work to sort 400k storages by slot as opposed to n*lg(n)? [n=# accounts] work to sort account data by slot later.
Also, dragging around Slot and write_version per account can get expensive.

Summary of Changes

Take advantage of the fact that storages are sorted by slot to reduce the data we keep and eventually speed up hash and lamport calculation overall. Note that this isn't faster overall yet. So, this pr likely won't be actually submitted until the next phases and supporting pieces are ready. But, this is a logical unit of work. And, it isn't a serious regression and uses less memory, so there are some tradeoffs. With different bin tuning, this algorithm can be as fast or faster as-is.
We eliminate a write version and a slot for each account entry we find (keep in mind there are duplicates).

Fixes #

jeffwashington · 2021-06-02T18:45:32Z

total went from 6.0s (master) to 6.7s (this pr).
Things like changing bins will help us catch up and scale farther.
I also intend to rework how we do flatten and sort.

this change:
calculate_accounts_hash_without_index accounts_scan=1984282i eliminate_zeros=400528i hash=231545i sort=2685521i hash_total=61837974i flatten=1174477i storage_sort_us=20538i unreduced_entries=62270664i collect_snapshots_us=214111i num_snapshot_storage=372887i total=6711002i
calculate_accounts_hash_without_index accounts_scan=2209913i eliminate_zeros=443076i hash=233774i sort=1473023i hash_total=61837974i flatten=1490250i storage_sort_us=20576i unreduced_entries=62270664i collect_snapshots_us=228110i num_snapshot_storage=372887i total=6098722i
master^

codecov · 2021-06-03T01:42:43Z

Codecov Report

Merging #17685 (c0da989) into master (708bbcb) will increase coverage by 0.0%.
The diff coverage is 81.6%.

@@           Coverage Diff            @@
##           master   #17685    +/-   ##
========================================
  Coverage    82.7%    82.8%            
========================================
  Files         431      433     +2     
  Lines      120590   120773   +183     
========================================
+ Hits        99839   100031   +192     
+ Misses      20751    20742     -9

jeffwashington · 2021-06-03T16:46:15Z

runtime/src/accounts_hash.rs

-                    // first entry for this key that starts in our slice
-                    result.push(now.hash);
-                    sum += now.lamports as u128;
+                let mut now = &slice[i];


The main change here is that we need to find the last entry for each pubkey. The previous code wanted the first item. Since we read directly from appendvecs and write_versions increase as we read, this is the natural sort order.

jeffwashington · 2021-06-03T16:47:12Z

runtime/src/accounts_db.rs

+                                });
+                            }
+                        } else {
+                            // we have to call the scan_func in order of write_version within a slot if there are multiple storages per slot


this messy code is required if we have > 1 append vec per slot. We need to write_version to be in increasing order in the final accumulated results.

jeffwashington · 2021-06-04T04:41:24Z

Building off this pr, the next step gets us from 6s (master) to 3.6s by reworking flatten and sort.

sakridge

looks good

Pull request has been modified.

jeffwashington · 2021-06-05T23:05:05Z

it turns out there is no discernible penalty for using the multiple storages/slot code path all the time, so that makes things simpler.
Also, there is enough noise in the measurements that it is approximately the same speed for this version of hash calculation vs keeping slot and write version. Perhaps it because the # of bins have increased in master.

* rework hash calculation to not keep slot and write version * refactor functions and add tests * always use multiple slot code path (cherry picked from commit b5bb91b)

…17794) * rework hash calculation to not keep slot and write version * refactor functions and add tests * always use multiple slot code path (cherry picked from commit b5bb91b) Co-authored-by: Jeff Washington (jwash) <[email protected]>

jeffwashington force-pushed the test_flatten75 branch 5 times, most recently from 956bb42 to c3e3743 Compare June 3, 2021 00:09

jeffwashington marked this pull request as ready for review June 3, 2021 16:45

jeffwashington requested a review from sakridge June 3, 2021 16:45

jeffwashington commented Jun 3, 2021

View reviewed changes

sakridge previously approved these changes Jun 4, 2021

View reviewed changes

jeffwashington force-pushed the test_flatten75 branch from c3e3743 to 08c40ab Compare June 4, 2021 19:52

jeffwashington added 2 commits June 4, 2021 15:52

rework hash calculation to not keep slot and write version

b953b2c

refactor functions and add tests

c68d124

always use multiple slot code path

c0da989

jeffwashington force-pushed the test_flatten75 branch from 08c40ab to c0da989 Compare June 5, 2021 23:05

jeffwashington added the v1.7 label Jun 7, 2021

jeffwashington merged commit b5bb91b into solana-labs:master Jun 7, 2021

mergify bot mentioned this pull request Jun 7, 2021

rework hash calculation to not keep slot and write version (backport #17685) #17794

Merged

brooksprumo mentioned this pull request Aug 23, 2021

backport 19361 v17 #19380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rework hash calculation to not keep slot and write version #17685

rework hash calculation to not keep slot and write version #17685

Uh oh!

jeffwashington commented Jun 2, 2021 •

edited

Loading

Uh oh!

jeffwashington commented Jun 2, 2021 •

edited

Loading

Uh oh!

codecov bot commented Jun 3, 2021 •

edited

Loading

Uh oh!

jeffwashington Jun 3, 2021

Uh oh!

jeffwashington Jun 3, 2021

Uh oh!

jeffwashington commented Jun 4, 2021

Uh oh!

sakridge left a comment

Uh oh!

jeffwashington commented Jun 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rework hash calculation to not keep slot and write version #17685

rework hash calculation to not keep slot and write version #17685

Uh oh!

Conversation

jeffwashington commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

jeffwashington commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeffwashington Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

jeffwashington Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

jeffwashington commented Jun 4, 2021

Uh oh!

sakridge left a comment

Choose a reason for hiding this comment

Uh oh!

jeffwashington commented Jun 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeffwashington commented Jun 2, 2021 •

edited

Loading

jeffwashington commented Jun 2, 2021 •

edited

Loading

codecov bot commented Jun 3, 2021 •

edited

Loading