-
Notifications
You must be signed in to change notification settings - Fork 883
Remove pubkey from DecisionMaker #7077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pubkey from DecisionMaker #7077
Conversation
0b103d3 to
3e77501
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7077 +/- ##
=======================================
Coverage 83.2% 83.2%
=======================================
Files 854 854
Lines 374686 374613 -73
=======================================
- Hits 312083 312025 -58
+ Misses 62603 62588 -15 🚀 New features to boost your workflow:
|
bw-solana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm too stupid to understand why we ever needed the old code.. so want to make sure I'm not missing something
| } else if would_be_leader_fn() { | ||
| // Node will be leader within ~20 slots, hold the transactions in | ||
| // case it is the only node which produces an accepted slot. | ||
| BufferedPacketsDecision::ForwardAndHold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding the case where we would even hit the Hold case for the old logic.
- If we didn't know the leader (because we don't have the leader schedule), but in reality we generate the schedule so far in advance relative to how far ahead we look that this seems impossible outside of weird, contrived test cases.
- If we somehow don't think we will be leader within 20 slots based on ticks but also think we will be leader in 2 slots based on the leader schedule. Is there even a corner case where this could happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to be clear, that we hit Hold very often. But there are 3 different cases in which the old logic could have resulted in Hold:
would_be_leader_shortly_fnreturns true:- This is the one we hit a lot.
- We are within 2 slots of being leader
- This also includes when we are currently leader, but don't yet a Bank to operate on.
leader_pubkey_fnreturnsSome(x)wherexIS the nodes ID pubkey (at start up)- We search 2 slots into the future in cached leader schedule. If it IS us, then we return
Hold. BUT this can only happen if our POH has an inconsistent view of who is leader in slot N+2 from the leader-schedule cache. - In practice the
ifdoesn't even work. Many operators have failover setup so that when they are starting up they are in a non-voting mode with a dummy key...since we grab it at start-up, we're comparing against the wrong key in this case anyway.
- We search 2 slots into the future in cached leader schedule. If it IS us, then we return
leader_pubkey_fnreturnsNone- this shouldn't ever happen because we generate leader schedules ahead of time and only look a small offset into the future (2 slots)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sorry, 1 makes perfect sense. I was referring to the cases we're removing (2 and 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and to summarize my comment - 2 and 3 should NOT happen unless there are other bugs, to the best of my knowledge.
tao-stones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thank for the explanations.
|
This is non-urgent, gonna wait for @jstarry's review since he is good at finding edge-cases in my understanding of poh. |
jstarry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Problem
DecisionMakercurrently takes in aPubkeywhich should be the ID of the nodeDecisionMakerwould likely failDecisionMakerlooks very shortly into the future, 2 slots. We should not be in a situation where there is not anEpochSchedulein the cache for that slotSummary of Changes
DecisionMakerFixes #