Skip to content

Conversation

@farnz
Copy link
Contributor

@farnz farnz commented Aug 25, 2021

Motivation

Tracking the idleness of the executor has several uses:

  1. Only starting new work when the worker would otherwise be completely idle
  2. Track worker idleness as part of determining whether more threads would help a measured performance issue
  3. Implement algorithms like Quiescent State Based Reclamation

Rather than implement mechanisms for each of these things separately, implement a generic mechanism to let us implement these in libraries atop tokio.

Solution

Similar to the thread start and thread stop callbacks, provide callbacks when a worker thread is parked (goes idle), and unparked (resumes doing work). This allows users to implement anything that depends on tracking how idle the executor is on top of these callbacks - and (for example) the user could choose to call into tracing to get idleness traces, or check their own work queue for work to do when the runtime is idle.

Note that as per the documentation, there's room to get things badly wrong with on_thread_unpark; I am not determined to keep this, but it feels useful to me in the idleness tracking use case despite the risks.

Fixes: #3975

@farnz
Copy link
Contributor Author

farnz commented Sep 6, 2021

I can rebase and force push a new version, but I've chosen not to for now to make it easier for existing reviewers to have a fresh look at the changes.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Sep 7, 2021
@farnz
Copy link
Contributor Author

farnz commented Sep 9, 2021

The test failure doesn't repro on my Fedora machine, nor on a Mac I have access to - I'll leave it for now as I can't retry the failed check without a new push.

If someone spots an easy fix or an easy repro plan (or can tell me that it's bad on master too), I'll be a happy coder.

@Darksonn
Copy link
Contributor

Darksonn commented Sep 9, 2021

I've restarted CI on the commit to check whether it's an issue with your PR or something else. Feel free to push empty commits if you want to restart CI. You can do this by passing --allow-empty to git commit.

@jmaygarden
Copy link

This is similar to a suggestion from @carllerche in response to #1481. For reference, there is a fork of tokio v0.2 with a different approach to park callbacks here:

jmagnuson@60201fa

That one passes a duration to a single callback (to handle park_timeout). That's useful to, for example, call event_base_loopexit when coexisting with C code using libevent.

@farnz
Copy link
Contributor Author

farnz commented Sep 9, 2021

This is similar to a suggestion from @carllerche in response to #1481. For reference, there is a fork of tokio v0.2 with a different approach to park callbacks here:

jmagnuson@60201fa

That one passes a duration to a single callback (to handle park_timeout). That's useful to, for example, call event_base_loopexit when coexisting with C code using libevent.

That's solving a different problem, and while it's in the same area of code, I don't believe that one solution covers both problems.

The branch you've pointed at, and Carl's suggestion, are both about the case where you're integrating tokio with an external event loop (such as one from libevent), where every time Tokio wants to go to sleep waiting for new events or a timeout, you want to notify the external event loop so that it can also look for new events. In this case, you want to know about every single time the worker parks its thread, so that you can extract the current "next timer expiry" timeout. Note, though, that the worker may hit the timeout, and then not unpark because something else has handled the timer expiry - but that's fine, because you can just re-enter the external event loop with the new timeout.

This is trying to solve the problem of allowing code to react to the Runtime being CPU-idle. For example, you might have a queue of incoming work; to keep latencies under control, you don't want to start new incoming work if the existing workload is enough to keep the Runtime fully busy. Here, you don't want to know about timer expiry - you simply care that the thread has stopped running tasks. The only reason to also expose on_thread_unpark is that there are several cases where you want to track how many threads are idle - e.g. for the work queue case, you want to know that there is an idle thread so that you can immediately spawn work without waiting for a new thread to go idle.

A combined solution feels like it'll be suboptimal for the cases I have in mind - I'm using Tokio's event loop, and I therefore don't want lots of spurious notifications; I just want to be able to stop threads parking if there's work they could be doing.

@farnz
Copy link
Contributor Author

farnz commented Sep 14, 2021

Re-reading the thread you pointed me at. I think a better route for solving that problem would be to expose a way to add a libevent driver to the stack, analogous to the existing time, I/O, signal and process drivers.

That then gives you full access to everything that you could want for integrating an extra source of events to the Runtime, and even allows you to override the underlying park mechanism if (for example) the external event loop would benefit from an alternate park mechanism to Tokio's defaults.

@Darksonn
Copy link
Contributor

I only have some review details on the comments in the PR. I'll merge the PR afterwards.

@farnz
Copy link
Contributor Author

farnz commented Sep 16, 2021

I believe I've fixed everything - let me know if there's more work you want me to do before you merge this PR, and I'll get it done.

@Darksonn Darksonn merged commit 957ed3e into tokio-rs:master Sep 16, 2021
@farnz farnz deleted the park-callbacks branch September 16, 2021 17:47
@Darksonn Darksonn mentioned this pull request Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a backpressure mechanism, to tell applications how loaded this Runtime is

4 participants