-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Explicit sync API for downloading important, possibly orphaned, forks #3633
Conversation
core/network/src/protocol/sync.rs
Outdated
| Vec<(PeerId, BlockRequest<B>)> | ||
| { | ||
| debug!(target: "sync", "Explicit sync request for block {:?} with {:?}", hash, peers); | ||
| if self.is_known(&hash) || self.is_already_downloading(&hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we need something a bit more explicit. The fork-sync requests should be kept around forever until removed or the block is actually known, and the is_already_downloading check should be done periodically. We can imagine a case where is_already_downloading is true at the moment, but then for some reason the node stops downloading that block. Then our requested block still wouldn't be synced.
|
|
||
| let block_status = self.client.block_status(&BlockId::Number(number - One::one())) | ||
| .unwrap_or(BlockStatus::Unknown); | ||
| if block_status == BlockStatus::InChainPruned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up, there should be a special mode where we just download the headers for this chain if requested, but we don't have the state to process them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3639 for a more detailed writeup of ^
|
It's also missing the APIs for higher-level code to inform the sync service about which peers it knows has the block. It should have add/remove capability. |
But |
|
The set may change over time, is what I mean - are multiple calls guaranteed to alter the existing "download session" so that it will sync the fork from the union of the given peer-sets? |
|
No, but I'm going to implement that along with request persistence. |
In general, I suggest sync being a little bit smarter about figuring out what "all forks" are based on what GRANDPA is interested in. Even if the fork is no longer active it should constantly be trying to sync it, even if attempts fail. |
core/network/src/protocol/sync.rs
Outdated
|
|
||
| if number > peer.best_number { | ||
| peer.best_number = number; | ||
| peer.best_hash = hash.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why set that to the peer's best block? i don't see this working when there is 1 peer with 2 forks we want to download, if that's a pre-requisite to being able to download those blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best_number is used to track global sync target. best_hash is not used for anything really, other than reporting generic peer info.
|
Fork download requests are now persistent. Sync will continue trying even if peer reconnects. |
core/network/src/protocol.rs
Outdated
| message::Direction::Ascending => id = BlockId::Number(number + One::one()), | ||
| message::Direction::Descending => { | ||
| if number.is_zero() { | ||
| if number.is_one() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is zero, will that cause an error further down the line when parent_hash gives None for the header/block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_hash is not an Option. for the genesis block it is a zero hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change is still wrong though. We should allow downloading the genesis block in principle.
| if block_status == BlockStatus::InChainPruned { | ||
| trace!(target: "sync", "Refusing to sync ancient block {:?}", hash); | ||
| return Vec::default(); | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also not the only kind of ancient fork we can encounter - what happens / is supposed to happen when the common ancestor is more than one block back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync attempts to download the chain up to the last finalized block, not just one block. There's currently a limit of how many blocks these forks thare downloaded up the chain can be. We can't be allowed to download infinite chains of headers/blocks without validating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync attempts to download the chain up to the last finalized block, not just one block
I don't understand what you mean. Yes, there can be a practical cutoff somewhere, and I think refusing to download forks beyond finalized_height + constant_allowance would be reasonable (with constant_allowance around 1000).
This allowance needs to be really permissive. Our nodes can afford a few extra MB of RAM if it means that you reduce the likelihood of consensus stalling by several orders of magnitude. And if consensus does stall, I don't really see any issue with downloading "infinitely" long chains as a light client, as long as it's in incremental parts, since there's nothing better the node could be doing and that would at least be trying to unstick consensus. The chains in practice aren't going to be infinitely long, as long as the block production mechanism is sane. If it's insane, then it's not the job of the sync code to work around that.
But the most recent finalized block of the fork can at most be the common ancestor (otherwise it wouldn't be a fork). So we always have to download beyond that.
| if let Some(request) = self.download_unknown_stale(&peer_id, &hash) { | ||
| requests.push((peer_id, request)); | ||
| if number > peer.best_number { | ||
| peer.best_number = number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the invariant that sync is meant to uphold?
Is this always supposed to be the peer's highest block, or is it the peer's best chain? We do have some non longest-chain fork choice rules (1. follow finality, 2. aurababous), so they are not always the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the highest block currently. The sync tracks the longest fork a bit differently from the shorter forks. E.g. it supports parallel block fetch for the longest chain only. It still fetches all forks though so it should not matter which one is actually "best". There will be a refactoring to get rid of the "main" chain notion, but this is out of scope of this PR.
core/network/src/protocol/sync.rs
Outdated
| requests: &HashMap<B::Hash, SyncRequest<B>>, | ||
| best_num: NumberFor<B>, | ||
| finalized: &B::Hash, | ||
| attributs: &message::BlockAttributes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: attributes?
| if !r.peers.contains(id) { | ||
| continue | ||
| } | ||
| if r.number <= best_num { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
questionable for the same reason as before - the orphan chain may be longer than the peer's "best" chain because of the fork-choice rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other case (r.number > best_num) is handled in a different code path in peer_block_request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. Why is the logic so scattered? I feel like I need a PhD to understand all the side effects and branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that there's a lot of implicit "but actually" in the interface of this function.
It's an explicit sync request...but actually only returns explicit requests for chains shorter than the best.
So the peer doesn't download that chain...but actually it does, it's covered in another function in another file (peer_block_request defers to needed_blocks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend that the function return a special type when there is implicit context that needs to be passed from the execution of this function into the execution of another, where the reader of the code can clearly see that the handling of some case has been deferred beyond the execution of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fundamentally caused by the same issue described here: #3633 (comment)
To be refactored.
|
Looks good but needs accounting for fork-choice rules |
|
After a chat with @rphmeier we've agreed to address all remaining issues in the following PRs |
mxinden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still getting familiar with the code, so please interpret some of these comments as questions instead of suggestions.
| /// Request syncing for the given block from given set of peers. | ||
| // This is similar to on_block_announce with unknown parent hash. | ||
| pub fn sync_fork(&mut self, peers: Vec<PeerId>, hash: &B::Hash, number: NumberFor<B>) { | ||
| if peers.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that calling sync_fork with an empty peers removes it from the sync_requests set? This is not what I would have expected from the function name. How about renaming it to set_sync_request or moving the logic to a separate function cancel_sync_request?
Please ignore this in case this is a pattern within Substrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to set_fork_sync_request
core/network/src/protocol/sync.rs
Outdated
| } | ||
|
|
||
| /// Request syncing for the given block from given set of peers. | ||
| // This is similar to on_block_announce with unknown parent hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // This is similar to on_block_announce with unknown parent hash. | |
| /// This is similar to on_block_announce with unknown parent hash. |
I feel like this could be a doc comment as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is about the implementation detail, not the interface.
core/network/src/service.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| /// Adds a `PeerId` and its address as reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Adds a `PeerId` and its address as reserved. |
This is a copy/paste error from above, right?
Sync services provides an API to download specified block chains explicitly.