Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@montekki
Copy link
Contributor

@montekki montekki commented Aug 30, 2020

No description provided.

@montekki montekki added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 30, 2020
@montekki montekki marked this pull request as ready for review September 3, 2020 18:42
@montekki montekki requested a review from rphmeier September 3, 2020 18:43
@montekki montekki added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Sep 3, 2020
@rphmeier rphmeier requested a review from coriolinus September 4, 2020 20:47
Copy link
Contributor

@coriolinus coriolinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of little things; should be approvable with a few touchups.

// TODO: This is tricky. If our chain has moved on, we have already canceled
// the relevant request and removed it from the map; so and we are not expecting
// this reply although technically it is not a malicious behaviur.
modify_reputation(ctx, origin, COST_UNEXPECTED_MESSAGE).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior seems plausible to me. Even so, should remove the TODO or link an issue for future discussion of what the behavior should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, mostly TODOs exist in the PR at places I have no idea what to do about, so

Copy link
Contributor

@rphmeier rphmeier Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible behavior would be to keep around requests for one block and then drop them. If a request arrives from an old block we know it is expired but don't need to report as the response is genuine.

However I can see the current jobs system making that hard. @coriolinus This is the kind of stuff I was referring to when I mentioned that the jobs system is kind of fragile. We should have a notion of transitions (parent -> child) and global data in the subsystem library.

edit: oh, this isn't using the jobs framework. Then the solution I mentioned could actually be viable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphmeier agree that it would be handy to have parent relation information in the active set update broadcasts, so it's possible to do things like "keep information for N blocks". However, I think that's a feature of its own which should probably be moved to its own issue so as not to delay this PR.

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

impl CollatorProtocolSubsystem {
/// Instantiate the collator protocol side of this subsystem
pub fn new_collator_side(id: CollatorId) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid making this choice at startup? It might introduce a circular dependency for Cumulus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean CollatorId? That's necessary when collations are advertised isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the choice of which side of the protocol to engage in.

/// For each relay parent and para id we may be connected to a number
/// of collators each of those may have advertised a different collation.
/// So we group such cases here.
requested_collations: HashMap<(Hash, ParaId, PeerId), RequestId>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting that we don't need to fetch any of the collations by collators.

modify_reputation(ctx, peer_id.clone(), COST_REQUEST_TIMEDOUT).await?;

// the callee will check if the parent is still in view, so do no checks here.
request_collation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this immediately request the collation again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, that was coming from the perspective of there being no CandidateSelection subsystem. If there is CandidateSelection, it should be CandidateSelection that decides to make another request. It doesn't really make sense to request from the collator who just timed out over and over again - the implication of the line quoted from the guide is that we should make another request of another collator.

Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is really orderly.

There are a few stray TODOs and questions that need to be resolved and I don't agree with the change of RequestCollation to RequestCollations.

@coriolinus coriolinus linked an issue Sep 8, 2020 that may be closed by this pull request
Copy link
Contributor

@coriolinus coriolinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, pending resolution of the final TODO.

// TODO: This is tricky. If our chain has moved on, we have already canceled
// the relevant request and removed it from the map; so and we are not expecting
// this reply although technically it is not a malicious behaviur.
modify_reputation(ctx, origin, COST_UNEXPECTED_MESSAGE).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphmeier agree that it would be handy to have parent relation information in the active set update broadcasts, so it's possible to do things like "keep information for N blocks". However, I think that's a feature of its own which should probably be moved to its own issue so as not to delay this PR.

@montekki montekki mentioned this pull request Sep 10, 2020
2 tasks
@montekki
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented Sep 10, 2020

Network error getting Process.json:

error decoding response body: unknown variant connected, expected one of added_to_project, assigned, closed, comment_deleted, converted_note_to_issue, convert_to_draft, demilestoned, head_ref_deleted, head_ref_force_pushed, head_ref_restored, labeled, locked, mentioned, marked_as_duplicate, merged, milestoned, moved_columns_in_project, ready_for_review, referenced, removed_from_project, renamed, reopened, review_dismissed, review_requested, review_request_removed, subscribed, transferred, unassigned, unlabeled, unlocked, unmarked_as_duplicate, unsubscribed, user_blocked at line 1 column 14571

@montekki
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented Sep 10, 2020

Network error getting Process.json:

error decoding response body: unknown variant connected, expected one of added_to_project, assigned, closed, comment_deleted, converted_note_to_issue, convert_to_draft, demilestoned, head_ref_deleted, head_ref_force_pushed, head_ref_restored, labeled, locked, mentioned, marked_as_duplicate, merged, milestoned, moved_columns_in_project, ready_for_review, referenced, removed_from_project, renamed, reopened, review_dismissed, review_requested, review_request_removed, subscribed, transferred, unassigned, unlabeled, unlocked, unmarked_as_duplicate, unsubscribed, user_blocked at line 1 column 14571

@montekki montekki merged commit b32dcc4 into paritytech:master Sep 10, 2020
ordian pushed a commit that referenced this pull request Sep 10, 2020
* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>
ordian pushed a commit that referenced this pull request Sep 10, 2020
* master:
  Allow the watermark to always land on the relay parent (#1689)
  Limit the maximum size of a downward message (#1690)
  Add deb and RPM repository config and documentation (#1676)
  Collator protocol subsystem (#1659)
ordian pushed a commit that referenced this pull request Oct 6, 2020
* stupid, but it compiles

* redo

* cleanup

* add ValidatorDiscovery to msgs

* sketch network bridge code

* ConnectToAuthorities instead of validators

* more stuff

* cleanup

* more stuff

* complete ConnectToAuthoritiesState

* Update node/network/bridge/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

* Collator protocol subsystem (#1659)

* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

* handle multiple in-flight connection requests

* handle cancelled requests

* Update node/core/runtime-api/src/lib.rs

Co-authored-by: Bernhard Schuster <[email protected]>

* redo it again

* more stuff

* redo it again

* update comments

* workaround Future is not Send

* fix trailing spaces

* clarify comments

* bridge: fix compilation in tests

* update more comments

* small fixes

* port collator protocol to new validator discovery api

* collator tests compile

* collator tests pass

* do not revoke a request when the stream receiver is closed

* make revoking opt-in

* fix is_fulfilled

* handle request revokation in collator

* tests

* wait for validator connections asyncronously

* fix compilation

* relabel my todos

* apply Fedor's patch

* resolve reconnection TODO

* resolve revoking TODO

* resolve channel capacity TODO

* resolve peer cloning TODO

* resolve peer disconnected TODO

* resolve PeerSet TODO

* wip tests

* more tests

* resolve Arc TODO

* rename pending to non_revoked

* one more test

* extract utility function into util crate

* fix compilation in tests

* Apply suggestions from code review

Co-authored-by: Fedor Sakharov <[email protected]>

* revert pin_project removal

* fix while let loop

* Revert "revert pin_project removal"

This reverts commit ae7f529.

* fix compilation

* Update node/subsystem/src/messages.rs

* docs on pub items

* guide updates

* remove a TODO

* small guide update

* fix a typo

* link to the issue

* validator discovery: on_request docs

Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>
Co-authored-by: Fedor Sakharov <[email protected]>
Co-authored-by: Bernhard Schuster <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Collator Protocol subsystem

3 participants