-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Handle errors on replaying ledger properly #7741
Conversation
| // validators don't replay this slot and see DuplicateSignature errors | ||
| // later in ReplayStage | ||
| verify_and_process_slot_entries(&bank, &entries, *last_entry_hash, opts).map_err(|err| { | ||
| warn!("slot {} failed to verify: {}", slot, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jstarry This is where having the replay and blocktree processor verification be the same would be useful.
If we don't mark this slot as dead here on failure, then ReplayStage will try to replay this slot again and will run into DuplicateSignature errors.
If they were the same implementation, then we know that if this fails here, it will also fail in ReplayStage, so it would be safe to mark the slot as dead here. Otherwise, if the implementations/criteria differ, one validator might mark one as dead, another won't depending on if they replay here on boot or in ReplayStage.
c89a324 to
7b603ba
Compare
7b603ba to
d08bad9
Compare
Codecov Report
@@ Coverage Diff @@
## master #7741 +/- ##
========================================
+ Coverage 81.7% 81.7% +<.1%
========================================
Files 241 241
Lines 50750 50814 +64
========================================
+ Hits 41489 41564 +75
+ Misses 9261 9250 -11 |
sakridge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
(cherry picked from commit 27d2c0a)
|
Umm, |
Problem
Problem 1:
When a slot is marked dead, https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L509, or when an error occurs on replay: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L520
continues.
This means https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L542-L549 (where the tip of the fork is added to fork_info, never runs for the end of that fork, so this logic doesn't run: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L467, and then the parents of that fork don't get added on BankForks construction here: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/bank_forks.rs#L146
This causes this slot to be replayed again in ReplayStage, leading to DuplicateSignature errors.
Problem 2:
In this else block: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L463-L467
, if any child of a slot
Pis incomplete,Pis added to thefork_infostructure even if it isn't the tip of a fork.For instance:
Bank
Ais added tofork_infoeven though the tip of the fork isB.Summary of Changes
Fix problems 1) and 2) above with better error handling and restructuring the processing pipeline by always adding a bank to
fork_info, and having any successfully replayed children delete the immediate parent fromfork_info(thanks @sakridge for the suggestion!)Add tests for above
Fixes #