-
Notifications
You must be signed in to change notification settings - Fork 2.7k
core/finality-grandpa: Request block sync from network after import timeout #3800
core/finality-grandpa: Request block sync from network after import timeout #3800
Conversation
|
|
||
| let incoming = incoming.select(out_rx); | ||
| // TODO: Do I understand correctly that we combine the incoming Grandpa messages with the messages we are | ||
| // sending out to other nodes and in addition thereby also send to ourself? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! this means that the logic for importing our own votes is unified with the logic for importing others' votes in the GRANDPA crate - reducing code paths in the vote processing pipeline and making bugs less likely.
| let next_log = *last_log + LOG_PENDING_INTERVAL; | ||
| // TODO: Should this not be >=? We want to log whenever we want to log every LOG_PENDING_INTERVAL, | ||
| // right? | ||
| if Instant::now() <= next_log { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rphmeier this should be >=, right?
The UntilImported component should log whenever waiting for a specific block to be imported surpassed a defined constant timeout. Without this patch the code would log whenever the current time was below the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some cleaning up but I will post a more thorough review later. I have tested this PR under the chaos net, it seems to be working correctly, nodes always managed to get the block data they needed to import votes (and I checked that requests were indeed going through explicit fork sync sometimes).
The only problem with the current implementation is that we will only ever request to sync from the peer that originally sent us the vote, even as we see the same vote being broadcast to us from other peers. The reason is that the low-level gossip layer keeps track of known messages, so subsequent receivals of the same message won't reach the higher layers (i.e. until_imported won't ever see a duplicate message). I am not sure how to fix this easily while keeping this logic in until_imported.
@rphmeier Any thoughts on this?
| /// Mapping block hashes to their block number, the point in time it was first encountered (Instant), nodes that | ||
| /// have the corresponding block (inferred by the fact that they send a message referencing it) and a list of | ||
| /// Grandpa messages referencing the block hash. | ||
| pending: HashMap<Block::Hash, (NumberFor<Block>, Instant, Vec<network::PeerId>, Vec<(M)>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a HashSet<PeerId>, we might see multiple messages from the same peer referencing the same hash.
|
Regarding the issue I mentioned above in my review. We could start by asking the peer that sent us the vote, and then fallback to asking any peer we're connected to that is on the same round or further from the round whose block data we're looking for. Another issue with the current implementation is that only validators will be actively trying to fetch block data. Full nodes just running the GRANDPA observer don't create any round streams and therefore don't run the code in |
core/network/src/protocol/sync.rs
Outdated
| hash, peers, | ||
| ); | ||
|
|
||
| peers = self.peers.iter().map(|(id, _)| id.clone()).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should filter out peers that have best_block < number. We don't want to request anything from peers that are definitely not yet synced.
When making an explicit fork sync request without specifying any peers, make sure to only request it from the locally known peers that are either ahead or on a par compared to the block number we are looking for.
6fe3171 to
202bced
Compare
| /// The environment we run GRANDPA in. | ||
| pub(crate) struct Environment<B, E, Block: BlockT, N: Network<Block>, RA, SC, VR> { | ||
| // TODO: As far as I can tell this does not follow the pattern of having an outer struct protecting an inner struct | ||
| // via a lock in order to be Sync. How about renaming this to `client` like we do in many other places? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I did the rename.
| status_check: Status, | ||
| block_sync_requester: BlockSyncRequester, | ||
| status_check: BlockStatus, | ||
| // TODO: Why is this called inner? Why not being more descriptive and say finality_msg_stream? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used with both the global per-set streams (commits, catch-ups) and the round streams (votes). Maybe rename it to msg_stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the TODO but we can address this in a follow up PR if this one gets merged soon.
| for (&block_hash, &mut (block_number, ref mut last_log, ref v)) in &mut self.pending { | ||
| if let Some(number) = self.status_check.block_number(block_hash)? { | ||
| // TODO: Now that we pass the number all the way down here, should we check if it is the same as we | ||
| // are expecting it to be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be checked later on in wait_completed, we pass the canon number to it (the one we fetch here and store in known_keys. The wait_completed implementations can look at the vote itself and the target number to figure out what the original number was.
andresilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mxinden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional commits from @andresilva look good to me. @arkpar do you have further thoughts on this?
Goal
In order to participate in the Grandpa gossip process a Grandpa voter needs
to have access to the blocks referenced in the Grandpa messages. Blocks are
being synchronized atonomously via the network sync service. In case this does
not work as expected e.g. due to delayed messages where the corresponding blocks
are already purged, we can use the new
set_sync_fork_requestAPI added in#3633 to tell the network:
Which blocks are relevant for the current round.
Where to retrieve these relevant blocks from.
This is further described in #3629.
An initial solution piggy-backing on the Grandpa neighbor messages was
suggested in #3755. After further discussions an easier (intermediate) solution
via Grandpa vote messages and the UntilImported component is suggested with
this pull request. In addition the UntilImported component already has the
notion of withholding messages and the notion of a import timeout.
2nd solution (Grandpa vote messages and UntilImported)
The UntilImported component within
core/finality-grandpawithholds messagesfrom the upper Grandpa layers until all blocks referenced in these messages
are imported on the local node.
With the goal above in mind we can notify the network sync service after a
certain timeout of waiting that we need a given block and which peers probably
have it.
Pull request status
I am opening this up for architectual feedbacks. Changes are grouped by their
functionality. This pull request replaces #3755. I will follow up on comments in
#3755 in later pull requests.