-
Notifications
You must be signed in to change notification settings - Fork 890
DecisionMaker - lock-free #7372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7372 +/- ##
=======================================
Coverage 83.2% 83.2%
=======================================
Files 800 800
Lines 362793 362735 -58
=======================================
- Hits 302162 302134 -28
+ Misses 60631 60601 -30 🚀 New features to boost your workflow:
|
e069858 to
6d1d104
Compare
| BufferedPacketsDecision::ForwardAndHold | ||
| } else if let Some(first_leader_tick_height) = self.shared_leader_first_tick_height.load() { | ||
| let current_tick_height = self.shared_tick_height.load(); | ||
| let ticks_until_leader = first_leader_tick_height.saturating_sub(current_tick_height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used to explicitly check that current_tick_height<=first_leader_tick_height. Are we intentionally removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope i was just being dumb when writing it, and apparently it was an untested case. I'll add the case to tests and fix. It SHOULD only take this path for a future tick_height, not one that we're past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In writing this out, I think I made a mistake in #7077.
This comment: https://github.com/anza-xyz/agave/pull/7077/files#r2222813775
explicitly stated that would_be_leader_shortly_fn returns true if we are within 2 slots of leader OR are leader but don't have a bank yet.
So I think I probably broke that second part with #7077; now it'd actually potentially be dropping packets between banks (very bad!).
The change in this PR would fix the dropping between banks.
I need to look again now at how the internal first leader tick is updated - is it possible for it remain Some past tick_height for a long period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be recalculated each time we clear or reset poh bank. so this PR fixes the bug i introduced in #7077 wrt intra-bank dropping of packets.
i.e. current tick past first leader tick is now treated same as within 2 slots until leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think it works.
- Consume the packets if we have a bank.
- Hold the packets if we're within N ticks of being leader. This could include being the actual leader.
- And once we finish leader span, we should immediately set
leader_first_tick_heightto our next leader slot (and stop holding packets if it's far in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in more detail what bug was introduced in #7077 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bw-solana did you mean to say the "last" leader tick height? I see a check for
self.tick_height <= self.leader_last_tick_height
but not for
current_tick_height<=first_leader_tick_height
AFAICT there was no bug introduced in #7077 because before these lock-free decision maker changes, we weren't accessing the tick height before the leader first tick height being updated. So either the first tick height is none or set to the next leader slot's tick. So would_be_leader should always be correct, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should learn to read one of these days. You're correct. it checks against the last leader tick directly, so no bug.
bw-solana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| BufferedPacketsDecision::ForwardAndHold | ||
| } else if let Some(first_leader_tick_height) = self.shared_leader_first_tick_height.load() { | ||
| let current_tick_height = self.shared_tick_height.load(); | ||
| let ticks_until_leader = first_leader_tick_height.saturating_sub(current_tick_height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think it works.
- Consume the packets if we have a bank.
- Hold the packets if we're within N ticks of being leader. This could include being the actual leader.
- And once we finish leader span, we should immediately set
leader_first_tick_heightto our next leader slot (and stop holding packets if it's far in the future)
Problem
Summary of Changes
DecisionMakerlock-free (andPohRecorderfree!)Fixes #