Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Conversation

@carllin
Copy link
Contributor

@carllin carllin commented Jan 10, 2020

Problem

Problem 1:

When a slot is marked dead, https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L509, or when an error occurs on replay: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L520
continues.

This means https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L542-L549 (where the tip of the fork is added to fork_info, never runs for the end of that fork, so this logic doesn't run: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L467, and then the parents of that fork don't get added on BankForks construction here: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/bank_forks.rs#L146

This causes this slot to be replayed again in ReplayStage, leading to DuplicateSignature errors.

Problem 2:

In this else block: https://github.com/solana-labs/solana/blob/v0.21.5/ledger/src/blocktree_processor.rs#L463-L467
, if any child of a slot P is incomplete, P is added to the fork_info structure even if it isn't the tip of a fork.

For instance:

      A
   /      \
 B       C (incomplete)

Bank A is added to fork_info even though the tip of the fork is B.

Summary of Changes

Fix problems 1) and 2) above with better error handling and restructuring the processing pipeline by always adding a bank to fork_info, and having any successfully replayed children delete the immediate parent from fork_info (thanks @sakridge for the suggestion!)

Add tests for above

Fixes #

@carllin carllin requested review from jstarry and sakridge January 10, 2020 02:19
// validators don't replay this slot and see DuplicateSignature errors
// later in ReplayStage
verify_and_process_slot_entries(&bank, &entries, *last_entry_hash, opts).map_err(|err| {
warn!("slot {} failed to verify: {}", slot, err);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jstarry This is where having the replay and blocktree processor verification be the same would be useful.

If we don't mark this slot as dead here on failure, then ReplayStage will try to replay this slot again and will run into DuplicateSignature errors.

If they were the same implementation, then we know that if this fails here, it will also fail in ReplayStage, so it would be safe to mark the slot as dead here. Otherwise, if the implementations/criteria differ, one validator might mark one as dead, another won't depending on if they replay here on boot or in ReplayStage.

@mvines mvines added the v0.22 label Jan 10, 2020
@carllin carllin force-pushed the FixBlocktreeProcessor branch 3 times, most recently from c89a324 to 7b603ba Compare January 10, 2020 09:08
@carllin carllin force-pushed the FixBlocktreeProcessor branch from 7b603ba to d08bad9 Compare January 10, 2020 09:09
@codecov
Copy link

codecov bot commented Jan 10, 2020

Codecov Report

Merging #7741 into master will increase coverage by <.1%.
The diff coverage is 98%.

@@           Coverage Diff            @@
##           master   #7741     +/-   ##
========================================
+ Coverage    81.7%   81.7%   +<.1%     
========================================
  Files         241     241             
  Lines       50750   50814     +64     
========================================
+ Hits        41489   41564     +75     
+ Misses       9261    9250     -11

Copy link
Contributor

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@carllin carllin merged commit 27d2c0a into solana-labs:master Jan 10, 2020
mergify bot pushed a commit that referenced this pull request Jan 10, 2020
@mvines
Copy link
Contributor

mvines commented Jan 10, 2020

Umm,


failures:
--
  | blocktree_processor::tests::test_process_blocktree_with_dead_child
  | blocktree_processor::tests::test_process_blocktree_with_dead_slot
  | blocktree_processor::tests::test_root_with_all_dead_children

solana-grimes pushed a commit that referenced this pull request Jan 10, 2020
sakridge pushed a commit to sakridge/solana that referenced this pull request Jan 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants