Skip to content

Conversation

liamaharon
Copy link
Collaborator

@liamaharon liamaharon commented Jul 22, 2025

Phase 1 in #1887.

I added the full NPoS dependencies in this PR primarily to reduce future merge conflicts, which recently been a bit time consuming to resolve.

Hybrid Node

The Hybrid Node handles producing, syncing and validating both Aura and Babe based blocks. It contains logic to dynamically switch between Aura and Babe based processing, depending on the specific block it needs to process.

Implementation

Node Service

The core changes are in the node service logic. Consensus specific service logic was migrated to a new ConsensusMechanism trait, which I have written both Aura and Babe implementation for.

Node RPC

Node rpc logic was made generic across consensus mechanisms by allowing the caller to pass the Frontier Consensus Data Provider to use, and custom RPC modules to instantiate.

CLI / Command

A new cli parameter --initial-consensus was added, which determines which ConsensusMechanism to first start the node in. The node will dynamically switch between ConsensusMechanisms at runtime as needed to continue block production / validation / import.

TODO

  • Fix compiling with runtime-benchmarks feature flag
  • Check ConditionalEVMBlockImport is up to date

@liamaharon liamaharon marked this pull request as ready for review July 22, 2025 23:17
@shamil-gadelshin shamil-gadelshin self-requested a review July 23, 2025 15:03
Copy link
Collaborator

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting approach, you've created the consensus engine abstraction on top of the existing crates! I have several questions about it:

  • Do you have the whole roadmap of the consensus change? It would be easier to know what part of the code is missing and what we expect in the next parts.
  • How do you test the consensus change? It seems it's not possible until the next part is merged.
  • Am I correct that if I propagate the custom babe block I can switch the consensus from a random node?
  • What do you think about the specific block number as an explicit consensus engine boundary? This way we'll get rid of the pending verification, string error as a marker, etc.

loop {
// Check if the runtime is Babe once per block.
if let Ok(c) = sc_consensus_babe::configuration(&*client) {
if !c.authorities.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hardcode the exact block number to switch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to reduce "magic numbers" that have the possibility to be misconfigured where possible. However if there is benefit to switching on block number instead of this I'm open to it

@liamaharon
Copy link
Collaborator Author

liamaharon commented Jul 25, 2025

Thank you @shamil-gadelshin for your detailed review!

Do you have the whole roadmap of the consensus change? It would be easier to know what part of the code is missing and what we expect in the next parts.

Yes, you can view the full changes (including runtime upgrade + migration) here: #1708

tldr; next changes after merging this are

  1. enacting a runtime upgrade migrating existing PoA authorities to NPoS authorities, and do not allow changes in that set
  2. slowly adding new whitelisted parties into that set
  3. opening the set up to anyone to validate

How do you test the consensus change? It seems it's not possible until the next part is merged.

You can test them on the node-decentralization branch, following the same steps I did here: https://discord.com/channels/1120750674595024897/1387662637843742760/1391994494714515588

If you're interested in it let me know and I'll send you exact commands I used to set it up :)

Am I correct that if I propagate the custom babe block I can switch the consensus from a random node?

This is an interesting edge case I have not considered! I think you are correct, I would need to do additional checks on the Babe block rather than switching straight to Babe consensus...

What do you think about the specific block number as an explicit consensus engine boundary? This way we'll get rid of the pending verification, string error as a marker, etc.

Yeah this is something I considered. Initially I tried to keep away from needing to hardcode any block numbers, keeping this as close as I could to being a "normal" runtime upgrade. However you raise good points about verification. I'll have a think about it.

@liamaharon liamaharon mentioned this pull request Jul 27, 2025
15 tasks
@liamaharon liamaharon changed the title Hybrid Node Hybrid Consensus Node Jul 27, 2025
Copy link
Collaborator

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes and the roadmap. It's easier to understand the general direction now.

My current concern is the moment of the consensus change. We don't need flexibility for this event because it seems like "one time unidirectional change". If the hardcoded magic const makes the code simpler, let's use it. If it introduces more issues than it solves, let's use your current method (presence of babe authorities) or a similar external marker.

gztensor
gztensor previously approved these changes Jul 28, 2025
@liamaharon liamaharon added the no-spec-version-bump PR does not contain changes that requires bumping the spec version label Jul 31, 2025
@shamil-gadelshin
Copy link
Collaborator

2025-07-28 10:51:15 panicked at ./subtensor/runtime/src/lib.rs:2421:13:
not implemented    
2025-07-28 10:51:15 💤 Idle (2 peers), best: #10 (0x85b3…3ec4), finalized #8 (0x7206…08d9), ⬇ 1.3kiB/s ⬆ 1.7kiB/s    
2025-07-28 10:51:16 panicked at ./subtensor/runtime/src/lib.rs:2421:13:
not implemented    
2025-07-28 10:51:16 💤 Idle (2 peers), best: #10 (0x85b3…3ec4), finalized #8 (0x7206…08d9), ⬇ 1.6kiB/s ⬆ 1.4kiB/s    
2025-07-28 10:51:16 panicked at ./subtensor/runtime/src/lib.rs:2421:13:
not implemented  

@gztensor saw this error after the runtime upgrade. It seems that "babe_switch" worker asks BabeApi and gets "not implemented"

Do we have a fix for this one?

@liamaharon
Copy link
Collaborator Author

liamaharon commented Aug 1, 2025

@gztensor saw this error after the runtime upgrade. It seems that "babe_switch" worker asks BabeApi and gets "not implemented"

Do we have a fix for this one?

@shamil-gadelshin thanks for giving it another try! It will panic if you try to use the Babe node with an Aura runtime. This branch does not include the change to the new Babe NPoS runtime, which I suspect is why you see that panic.

Please try following my demo here https://discord.com/channels/1120750674595024897/1387662637843742760/1399902628309106822 where I test it on the node-decentralization branch. That branch includes both the hybrid node changes from this branch, as well as the new Babe NPoS runtime.

Let me know how you go!

@liamaharon liamaharon removed the no-spec-version-bump PR does not contain changes that requires bumping the spec version label Aug 4, 2025
Copy link
Collaborator

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime upgrade works without bugs. Warp sync works as expected.

@liamaharon liamaharon merged commit 8febe38 into devnet-ready Aug 7, 2025
88 of 90 checks passed
@sam0x17
Copy link
Contributor

sam0x17 commented Aug 7, 2025

reverted until we can test on mainnet clone and a few other things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants