-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Collator protocol subsystem #1659
Collator protocol subsystem #1659
Conversation
coriolinus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of little things; should be approvable with a few touchups.
| // TODO: This is tricky. If our chain has moved on, we have already canceled | ||
| // the relevant request and removed it from the map; so and we are not expecting | ||
| // this reply although technically it is not a malicious behaviur. | ||
| modify_reputation(ctx, origin, COST_UNEXPECTED_MESSAGE).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior seems plausible to me. Even so, should remove the TODO or link an issue for future discussion of what the behavior should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, mostly TODOs exist in the PR at places I have no idea what to do about, so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible behavior would be to keep around requests for one block and then drop them. If a request arrives from an old block we know it is expired but don't need to report as the response is genuine.
However I can see the current jobs system making that hard. @coriolinus This is the kind of stuff I was referring to when I mentioned that the jobs system is kind of fragile. We should have a notion of transitions (parent -> child) and global data in the subsystem library.
edit: oh, this isn't using the jobs framework. Then the solution I mentioned could actually be viable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rphmeier agree that it would be handy to have parent relation information in the active set update broadcasts, so it's possible to do things like "keep information for N blocks". However, I think that's a feature of its own which should probably be moved to its own issue so as not to delay this PR.
Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>
|
|
||
| impl CollatorProtocolSubsystem { | ||
| /// Instantiate the collator protocol side of this subsystem | ||
| pub fn new_collator_side(id: CollatorId) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid making this choice at startup? It might introduce a circular dependency for Cumulus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean CollatorId? That's necessary when collations are advertised isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the choice of which side of the protocol to engage in.
| /// For each relay parent and para id we may be connected to a number | ||
| /// of collators each of those may have advertised a different collation. | ||
| /// So we group such cases here. | ||
| requested_collations: HashMap<(Hash, ParaId, PeerId), RequestId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth noting that we don't need to fetch any of the collations by collators.
| modify_reputation(ctx, peer_id.clone(), COST_REQUEST_TIMEDOUT).await?; | ||
|
|
||
| // the callee will check if the parent is still in view, so do no checks here. | ||
| request_collation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this immediately request the collation again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, that was coming from the perspective of there being no CandidateSelection subsystem. If there is CandidateSelection, it should be CandidateSelection that decides to make another request. It doesn't really make sense to request from the collator who just timed out over and over again - the implication of the line quoted from the guide is that we should make another request of another collator.
rphmeier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is really orderly.
There are a few stray TODOs and questions that need to be resolved and I don't agree with the change of RequestCollation to RequestCollations.
coriolinus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, pending resolution of the final TODO.
| // TODO: This is tricky. If our chain has moved on, we have already canceled | ||
| // the relevant request and removed it from the map; so and we are not expecting | ||
| // this reply although technically it is not a malicious behaviur. | ||
| modify_reputation(ctx, origin, COST_UNEXPECTED_MESSAGE).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rphmeier agree that it would be handy to have parent relation information in the active set update broadcasts, so it's possible to do things like "keep information for N blocks". However, I think that's a feature of its own which should probably be moved to its own issue so as not to delay this PR.
|
bot merge |
|
Network error getting Process.json: error decoding response body: unknown variant |
|
bot merge |
|
Network error getting Process.json: error decoding response body: unknown variant |
* WIP * The initial implementation of the collator side. * Improve comments * Multiple collation requests * Add more tests and comments to validator side * Add comments, remove dead code * Apply suggestions from code review Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> * Fix build after suggested changes * Also connect to the next validator group * Remove a Future impl and move TimeoutExt to util * Minor nits * Fix build * Change FetchCollations back to FetchCollation * Try this * Final fixes * Fix build Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>
* stupid, but it compiles * redo * cleanup * add ValidatorDiscovery to msgs * sketch network bridge code * ConnectToAuthorities instead of validators * more stuff * cleanup * more stuff * complete ConnectToAuthoritiesState * Update node/network/bridge/src/lib.rs Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> * Collator protocol subsystem (#1659) * WIP * The initial implementation of the collator side. * Improve comments * Multiple collation requests * Add more tests and comments to validator side * Add comments, remove dead code * Apply suggestions from code review Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> * Fix build after suggested changes * Also connect to the next validator group * Remove a Future impl and move TimeoutExt to util * Minor nits * Fix build * Change FetchCollations back to FetchCollation * Try this * Final fixes * Fix build Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> * handle multiple in-flight connection requests * handle cancelled requests * Update node/core/runtime-api/src/lib.rs Co-authored-by: Bernhard Schuster <[email protected]> * redo it again * more stuff * redo it again * update comments * workaround Future is not Send * fix trailing spaces * clarify comments * bridge: fix compilation in tests * update more comments * small fixes * port collator protocol to new validator discovery api * collator tests compile * collator tests pass * do not revoke a request when the stream receiver is closed * make revoking opt-in * fix is_fulfilled * handle request revokation in collator * tests * wait for validator connections asyncronously * fix compilation * relabel my todos * apply Fedor's patch * resolve reconnection TODO * resolve revoking TODO * resolve channel capacity TODO * resolve peer cloning TODO * resolve peer disconnected TODO * resolve PeerSet TODO * wip tests * more tests * resolve Arc TODO * rename pending to non_revoked * one more test * extract utility function into util crate * fix compilation in tests * Apply suggestions from code review Co-authored-by: Fedor Sakharov <[email protected]> * revert pin_project removal * fix while let loop * Revert "revert pin_project removal" This reverts commit ae7f529. * fix compilation * Update node/subsystem/src/messages.rs * docs on pub items * guide updates * remove a TODO * small guide update * fix a typo * link to the issue * validator discovery: on_request docs Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> Co-authored-by: Fedor Sakharov <[email protected]> Co-authored-by: Bernhard Schuster <[email protected]>
No description provided.