feat(alpenglow): process block components in blockstore processor#11054
feat(alpenglow): process block components in blockstore processor#11054ksn6 wants to merge 4 commits intoanza-xyz:masterfrom
Conversation
core/src/replay_stage.rs
Outdated
|
|
||
| // Verify and process block components (e.g., header, footer) before freezing. | ||
| // Only verify blocks that were replayed from blockstore (not leader blocks). | ||
| if migration_status.should_allow_block_markers(bank_slot) && !is_leader_block { |
There was a problem hiding this comment.
I'm introducing migration_status.should_allow_block_markers(...) here as an additional check to avoid block component processor behavior pre-Alpenglow.
See here for previous logic:
ledger/src/blockstore_processor.rs
Outdated
| // In reality, we don't need to run migration checks for BlockMarkers here, despite BlockMarkers | ||
| // only being active post-Alpenglow. Here's why: | ||
| // | ||
| // Post-Alpenglow migration - validators that have Alpenglow enabled can parse BlockComponents. | ||
| // Things just work. | ||
| // | ||
| // Pre-Alpenglow migration, suppose a validator receives a BlockMarker: | ||
| // | ||
| // (1) validators *incapable* of processing BlockMarkers will mark the slot as dead on shred | ||
| // ingest in blockstore. | ||
| // | ||
| // (2) validators *capable* of processing BlockMarkers will store the BlockMarkers in shred | ||
| // ingest, run through this verifying code here, and then error out when processing a | ||
| // BlockMarker, resulting in the slot being marked as dead. | ||
| // | ||
| // However, we're running migration checks here anyways for consistency with the rest of the | ||
| // codebase. | ||
| if migration_status.is_alpenglow_enabled() { | ||
| return confirm_alpenglow_slot( | ||
| blockstore, | ||
| bank, | ||
| replay_tx_thread_pool, | ||
| timing, | ||
| progress, | ||
| skip_verification, | ||
| transaction_status_sender, | ||
| entry_notification_sender, | ||
| replay_vote_sender, | ||
| allow_dead_slots, | ||
| log_messages_bytes_limit, | ||
| prioritization_fee_cache, | ||
| migration_status, | ||
| ); | ||
| } |
There was a problem hiding this comment.
As done in alpenglow, replacing confirm_slot with confirm_alpenglow_slot would actually work pre-migration due to the reasons mentioned in the comment here. See https://github.com/anza-xyz/alpenglow/pull/575/changes#diff-281d947e593a76f3c3380804162dc8cf426fcc33e0ae1192589c8193221cd1e2R1517-R1575 for more detail.
For exposition and consistency, we're avoiding inlining here and instead opt to hide the logic behind a feature flag.
70176a1 to
24dd9c9
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #11054 +/- ##
=======================================
Coverage 83.1% 83.1%
=======================================
Files 837 837
Lines 316652 316678 +26
=======================================
+ Hits 263170 263214 +44
+ Misses 53482 53464 -18 🚀 New features to boost your workflow:
|
|
Note that we're modifying test logic in broadcast. Rationale:
Before this PR, these broadcast stages were shredding via
Without this change, the test gets stuck in an infinite loop where slot 30 repeatedly gets marked dead with |
alexpyattaev
left a comment
There was a problem hiding this comment.
I'm not 100% clear on what the plan is here. Why are we patching all of the test broadcast stages and not the "production" one? Should they not be essentially identical for the tests to be meaningful?
Another question, did you confirm that none of the changes break invalidator?
AshwinSekar
left a comment
There was a problem hiding this comment.
I tend to agree with alex here, the order of landing this is kind of weird.
I think it would be better to also modify the production broadcast/bcl to insert the header and footer so that we don't have to comment out code right now.
We can do it in parts, just populating an empty header & footer and upstream the full header footer next if easier.
adc624f to
856918b
Compare
856918b to
b6c4524
Compare
alexpyattaev
left a comment
There was a problem hiding this comment.
LGTM, please wait for @AshwinSekar to take a look too.
Problem and Summary of Changes
confirm_slotinblockstore_processoronly processes raw entries. Post-Alpenglow, slots contain block components (entry batches and block markers like headers/footers) that need to be parsed, validated, and run during replay.This PR finishes upstreaming anza-xyz/alpenglow#575.
Thanks to @alexpyattaev for contributing to fixing the test in
broadcast_duplicates_run.rshere: alexpyattaev@3abdf5e