-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Overseer: subsystems communicate directly #2227
Conversation
I think we could do better than this, but I need to check a few things first and think a bit more about it. |
coriolinus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core logic in OverseerSubsystemContext::recv LGTM. I agree that this maintains the message invariant.
I'm pleasantly surprised that making this change didn't require changing any of the individual subsystems.
node/overseer/src/lib.rs
Outdated
| signals_received: usize, | ||
| message: AllMessages, | ||
| ) { | ||
| fn make_packet<T>(timer: MaybeTimer, signals_received: usize, message: T) -> MessagePacket<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad we don't have HKTs yet; if we did, you could simplify this into
let make_packet = for<T> |message: T| { ... }and simply capture the values of timer and signals_received, which would simplify usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can do without, if we are sure this is the way to go, we can just expose the subsystem senders directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought when I was writing the code, but unfortunately no HKT yet
node/overseer/src/lib.rs
Outdated
| if res.is_err() { | ||
| tracing::debug!( | ||
| target: LOG_TARGET, | ||
| "Failed to send a message to another subsystem", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that at some point we're going to want to debug which particular subsystem was the intended recipient of the failed message. Maybe an api like
impl AllMessages {
pub fn recipient_name(&self) -> &'static str { ... }
}would make that simple to add to this debug message.
eskimor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible to me.
drahnr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation wise, it looks good.
As a side effect here we lose the overseer as our probe point for messages though.
Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>
1. we don't provide good names 2. these names are never used anywhere
We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.
| prometheus::Opts::new( | ||
| "parachain_to_overseer_sent", | ||
| "Number of elements sent by subsystems to overseer", | ||
| "parachain_overseer_signals_sent", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: what's the difference between parachain_subsystem and parachain_overseer prefix? They both are parameterized by subsystem_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsystem_signals_sent sounds like the subsystems are sending the signals but they are actually receiving them. overseer_unbounded_sent has the same issue in the opposite direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are OverseerSignals so overseer_signals_sent makes sense I think. But I could rename subsystem_bounded_sent etc.. not sure to what
|
bot merge |
|
Waiting for commit status. |
* overseer: pass messages directly between subsystems * test that message is held on to * Update node/overseer/src/lib.rs Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> * give every subsystem an unbounded sender too * remove metered_channel::name 1. we don't provide good names 2. these names are never used anywhere * unused mut * remove unnecessary &mut * subsystem unbounded_send * remove unused MaybeTimer We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly. * remove comment * split up senders and receivers * update metrics * fix tests * fix test subsystem context * fix flaky test * fix docs * doc * use select_biased to favor signals * Update node/subsystem/src/lib.rs Co-authored-by: Andronik Ordian <[email protected]> Co-authored-by: Peter Goodspeed-Niklaus <[email protected]> Co-authored-by: Andronik Ordian <[email protected]>
The previous behavior was that messages would go first to the overseer and then be relayed onwards to the destination subsystem.
Now, subsystems communicate via channels directly to other subsystems. Additionally, they have the option of either sending a message through a bounded or unbounded channel. The unbounded channels should be used sparingly in situations where:
A few examples of places we might use
send_unbounded_messageare when we send locally generated backing statements to statement distribution or when we send assignments and approvals from approval voting to the approval distribution subsystem. This avoids deadlock when a network subsystem is.awaiting a send to the core subsystem, while the core subsystem sends to the networking subsystem at the same time. If buffers are full, that will deadlock. Unbounded channels avoids this situation.SubsystemContextand the overseer itself have an invariant that when a subsystem sends a message after receiving a signal from the overseer, that the recipient of the message will not receive it until having observed the same signal. This allows subsystems to assume that other subsystems are aware of the same view of blocks as they are.My change-set here upholds the same invariant by having each subsystem tag each message with the number of signals that it had observed before sending the message. If the recipient has not received at least that many signals, it pockets the message until it has received enough signals. This rests on the fact that the overseer sends signals to all subsystems equally. If one subsystem receives a signal, others can expect to receive it shortly.
I have also given
SubsystemContextaSubsystemSenderassoc. type which will be used to split sending and receiving. Follow-up PRs will make use of this feature.