Skip to content

feat(alpenglow): process block components in blockstore processor#11054

Open
ksn6 wants to merge 4 commits intoanza-xyz:masterfrom
ksn6:update-blockstore-processor
Open

feat(alpenglow): process block components in blockstore processor#11054
ksn6 wants to merge 4 commits intoanza-xyz:masterfrom
ksn6:update-blockstore-processor

Conversation

@ksn6
Copy link

@ksn6 ksn6 commented Mar 6, 2026

Problem and Summary of Changes

confirm_slot in blockstore_processor only processes raw entries. Post-Alpenglow, slots contain block components (entry batches and block markers like headers/footers) that need to be parsed, validated, and run during replay.

This PR finishes upstreaming anza-xyz/alpenglow#575.

Thanks to @alexpyattaev for contributing to fixing the test in broadcast_duplicates_run.rs here: alexpyattaev@3abdf5e

@ksn6 ksn6 changed the title Update blockstore processor feat(alpenglow): process block components in blockstore processor Mar 6, 2026

// Verify and process block components (e.g., header, footer) before freezing.
// Only verify blocks that were replayed from blockstore (not leader blocks).
if migration_status.should_allow_block_markers(bank_slot) && !is_leader_block {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm introducing migration_status.should_allow_block_markers(...) here as an additional check to avoid block component processor behavior pre-Alpenglow.

See here for previous logic:

https://github.com/anza-xyz/alpenglow/pull/575/changes#diff-6d8458bb2e53158ac472a9ad4709e6a0a52b75d930c019013298e8acda133828R3695

Comment on lines +1618 to +1651
// In reality, we don't need to run migration checks for BlockMarkers here, despite BlockMarkers
// only being active post-Alpenglow. Here's why:
//
// Post-Alpenglow migration - validators that have Alpenglow enabled can parse BlockComponents.
// Things just work.
//
// Pre-Alpenglow migration, suppose a validator receives a BlockMarker:
//
// (1) validators *incapable* of processing BlockMarkers will mark the slot as dead on shred
// ingest in blockstore.
//
// (2) validators *capable* of processing BlockMarkers will store the BlockMarkers in shred
// ingest, run through this verifying code here, and then error out when processing a
// BlockMarker, resulting in the slot being marked as dead.
//
// However, we're running migration checks here anyways for consistency with the rest of the
// codebase.
if migration_status.is_alpenglow_enabled() {
return confirm_alpenglow_slot(
blockstore,
bank,
replay_tx_thread_pool,
timing,
progress,
skip_verification,
transaction_status_sender,
entry_notification_sender,
replay_vote_sender,
allow_dead_slots,
log_messages_bytes_limit,
prioritization_fee_cache,
migration_status,
);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As done in alpenglow, replacing confirm_slot with confirm_alpenglow_slot would actually work pre-migration due to the reasons mentioned in the comment here. See https://github.com/anza-xyz/alpenglow/pull/575/changes#diff-281d947e593a76f3c3380804162dc8cf426fcc33e0ae1192589c8193221cd1e2R1517-R1575 for more detail.

For exposition and consistency, we're avoiding inlining here and instead opt to hide the logic behind a feature flag.

@ksn6 ksn6 requested a review from AshwinSekar March 6, 2026 08:19
@ksn6 ksn6 force-pushed the update-blockstore-processor branch 4 times, most recently from 70176a1 to 24dd9c9 Compare March 8, 2026 10:33
@ksn6 ksn6 requested a review from a team as a code owner March 8, 2026 10:33
@ksn6 ksn6 marked this pull request as draft March 8, 2026 10:33
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 57.50000% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.1%. Comparing base (f41a763) to head (4adaf27).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11054   +/-   ##
=======================================
  Coverage    83.1%    83.1%           
=======================================
  Files         837      837           
  Lines      316652   316678   +26     
=======================================
+ Hits       263170   263214   +44     
+ Misses      53482    53464   -18     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ksn6
Copy link
Author

ksn6 commented Mar 8, 2026

Note that we're modifying test logic in broadcast. Rationale:

  • confirm_slot now deserializes shred data as BlockComponent rather than Vec.
  • BlockComponent uses custom serde, where first 8 bytes are the entry count; non-zero = EntryBatch, zero = BlockMarker
  • when corrupted shreds produce payloads starting with zero bytes, the deserializer tries to parse a BlockMarker and fails. The old Vec<Entry> path just read zero entries and moved on.

Before this PR, these broadcast stages were shredding via entries_to_merkle_shreds_for_tests (raw Vec format). I'm switching to component_to_merkle_shreds_for_tests (wraps entries in BlockComponent::EntryBatch) so shred payloads are in the BlockComponent wire format that confirm_slot expects.

test_duplicate_shreds_broadcast_leader is what caused us to need to write this change.

Without this change, the test gets stuck in an infinite loop where slot 30 repeatedly gets marked dead with FailedToLoadEntries(InvalidShredData(Custom("could not reconstruct block component: Io(ReadSizeLimit(2))"))), then repaired, then marked dead again. The cluster never makes progress past root 1.

@ksn6 ksn6 marked this pull request as ready for review March 8, 2026 12:10
@ksn6 ksn6 requested a review from AshwinSekar March 8, 2026 12:10
Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear on what the plan is here. Why are we patching all of the test broadcast stages and not the "production" one? Should they not be essentially identical for the tests to be meaningful?

Another question, did you confirm that none of the changes break invalidator?

Copy link

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with alex here, the order of landing this is kind of weird.

I think it would be better to also modify the production broadcast/bcl to insert the header and footer so that we don't have to comment out code right now.

We can do it in parts, just populating an empty header & footer and upstream the full header footer next if easier.

@ksn6 ksn6 force-pushed the update-blockstore-processor branch from adc624f to 856918b Compare March 10, 2026 21:04
@ksn6 ksn6 force-pushed the update-blockstore-processor branch from 856918b to b6c4524 Compare March 11, 2026 00:19
Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please wait for @AshwinSekar to take a look too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants