Skip to content

Conversation

@dignifiedquire
Copy link
Contributor

Work on integrating n0-computer/quinn#28 into the iroh magic

@n0bot n0bot bot added this to iroh Jul 7, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Jul 7, 2025
@github-actions
Copy link

github-actions bot commented Jul 7, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3381/docs/iroh/

Last updated: 2025-11-17T12:57:13Z

@github-actions
Copy link

github-actions bot commented Jul 7, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: cf5fa83

@dignifiedquire dignifiedquire changed the title [WIP] feat: use quinn multipath [WIP] feat: use quic multipath Jul 8, 2025
@dignifiedquire dignifiedquire force-pushed the feat-multipath branch 3 times, most recently from 72cb071 to db712c0 Compare July 18, 2025 14:39
@dignifiedquire dignifiedquire force-pushed the feat-multipath branch 2 times, most recently from 4827e62 to 946f71c Compare July 28, 2025 20:29
Frando and others added 8 commits November 10, 2025 10:25
## Description

This has a few minor cleanups without any functional changes in the
endpoint state actor:
* Remove double handle upgrade
* Add helper function `to_transport_addr` on the relay mapped addr map
* Use hash map `entry` API instead of `get` and `expect`
* Use `if let` chains to remove a level of indentation

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
In the endpoint state actor, this uses the `Connection::on_closed` future added in n0-computer/quinn#153 to remove connections once they are closed instead of relying on manual cleanup.
…async (#3629)

## Description

Currently, `Magicsock::register_connection` is a sync function, but
needs to send over an async channel to notify the endpoint state actor
about the new connection. It currently employs a hack to achieve that:
it spawns a tokio task for sending the message.

This PR cleans this up by making `regsiter_connection` return a future,
and awaits this future at the various sites where we go from
quinn::Connection to iroh Connection. Luckily, all these call sites
already are in async contexts.

* When going from `Connecting` or `Accepting` to `Connection`, we await
the registration after having the `quinn::Connecting` completes. The
future is stored in an option instead of using a state enum as you would
usually, because we need unconditional access to the `quinn::Connecting`
in the functions on `Connecting`/`Accepting`.
* For the `(Incoming|Outgoing)ZeroRttConnection`, we store a future that
first awaits the handshake and then registers the connection. So we need
only a single future here.

With `register_connection` being async, we can also clean up some of the
not-so-nice things introduced in #3622: Because we now have an async
function, we can let the endpoint state actor return a reply. This makes
it much more straightforward because we can have the endpoint state
actor initialize a watcher for the paths and return it instead of having
to do a weird dance with parts of the state being initialized or stored
outside of the endpoint state actor to satisfy the sync function
constraints. This is much nicer now IMO.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

This adds a boxed future into the process of going from a `Connecting`
to a `Connection`. If we really wanted, we could use a manually
implemented future instead. However, I don't think one boxed future *per
connection* is an issue, so I'd prefer to leave it like this
(implementing a manual future for `tokio::mpsc::Sender::send` is
cumbersome).

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
First step for #3641, the rest can be done once quic holepunching has
landed

Unfortunately `send_disco_message` also needs the sender, so it can't be
fully removed from the `EndpointState`
## Description

* Remove `Endpoint::conn_type`
* Update `transfer.rs` example to use `Connection::paths` instead
* Change return type of `Connection::paths` to have more guarantees on
the return type (Send, Unpin, 'static)

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
@dignifiedquire
Copy link
Contributor Author

/netsim report

Frando and others added 21 commits November 11, 2025 17:49
## Description

Alternative to #3631

Replaces the `Watchable`s for path changes on the `Connection` with a
boxed `Watcher`. The watcher is boxed because it would increase the
`Connection` struct size significantly otherwise because the
mapped-and-joined watcher with a `SmallVec` of `PathInfo` inside is ~600
bytes atm.

The benefit of storing a `Watcher` and not a `Watchable` is that the
watcher streams now close once the EndpointStateActor drops the state
for the connection, which it does after the connection is closed.

Also adds a test for path watching, including testing that the streams
now close when the connection closes.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
## Description

Fixes #3638 (partially)

This is a first, small solution to stop inactive endpoint actors after
an idle timeout. I implemented it such that the *actor* decides once to
stop, while making sure that we *never* create senders to actors that
are shutting down.

Logic in the actor:
* The actor enters an idle timeout (set to 60 seconds) once it has no
active connections, an empty inbox, and no inbox senders
* Once the timeout expires, it is rechecked that the idle conditions
hold, and if so the actor exits
* Once any of the idle conditions don't hold anymore, the idle timeout
is deactivated and restarted once the conditions are met again

The actor checks if the inbox's sender strong count equals 1, which
means that no senders exist apart from the one held in the endpoint map.
This check is protected with a mutex, to enter a critical section for
closing the inbox while the lock is held in case the conditions are met.
This is to ensure that there cannot be a race condition where a sender
is cloned out right after the check in the actor returns true, but
before the inbox is closed.

Logic in the endpoint map:
* When handing out senders, we acquire the shared lock, and check that
the channel is not closed while the lock is held. This ensures that the
actor never closes while a sender is alive. If the actor is closed, we
remove the handle to the dead actor and create a new actor.
* On regular intervals (set to 60 seconds) the magicsock actor removes
handles to dead actors.


## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

* I *think* my logic around the critical section and ensuring that we
never close the actor while senders exist is sound. However, it needs
careful review and tests. I'll do some thinking on how to best test
this.
* Instead of employing an interval to remove dead actor handles, we
could use a channel where the actor informs an outside-task which
endpoint actors terminated, so that the outside-task can then lock the
endpoint map and remove just those. Not sure if that's worth it.
* Another solution here might be to spawn the actor tasks into a join
set in the magicsock actor. However this would need further refactoring
and would likely make spawning actors async. I think I'd prefer to keep
that sync because it makes the surrounding code a lot simpler.
* This does not yet implement some of the more advanced reasoning that
#3638 proposes. I think we should start with something simple that
prevents memory exhaustion and tweak as needed. However, it could also
be argued that we should start with a more featureful design right away.

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
…nnection closes (#3650)

## Description

Addresses #3602 for multipath.

This implements the first bullet point from @flub's suggestion in the
above issue: We clear the `selected_path` once the last known connection
to an endpoint closes.

This means that a new connection attempt after that will instead send to
all addresses again, and avoids the case where we send on e.g. the old
port from a previous restart of the endpoint we're connecting to.

This turns the `test_0rtt_after_server_restart` test green.

## Notes & open questions

I *think* this will still fail if we have an "open" connection to the
server that's in the process of timing out and we open a new connection
to the restarted server while that's happening. I'm not sure though.

## Change checklist
<!-- Remove any that are not relevant. -->
- [x] Self-review.
Co-authored-by: Philipp Krüger <[email protected]>
## Description

For some odd reason AuthenticationError was given a branch that had
quinn::ConnectionError inside it.  But logically the connection error
has nothing to do with authentication.  That should have been a red
flag.

AuthenticationError itself is almost always wrapped in
ConnectingError, which does correctly have a quinn::ConnectionError
branch.  And the few places where it was directly returned to the user
it arguably **should** have been wrapped in a ConnectingError.

The result of this is that before this fix you would get a very
confusing authentication error if the remote client closed the
connection right at the same time as the handshake completed for
it (yes, this is difficult to do at the right time, and it only
happens for the client since that completes the handshake one network
hop before the server).  But this was no authentication error, it is
simply a closed connection.  The new error structure captures this
correctly.

Similarly the InternalConsistencyError belongs on the ConnectingError.
Though that one should be impossible to produce since it's supposed to
be an invariant.

## Breaking Changes

- `AuthenticationError` loses the `ConnectionError` and
  `InternalConsistencyError` branches.  Both are on the
  `ConnectingError` instead.
- `OutgoingZeroRttConnection::handshake_completed` and
  `IncomingZeroRttConnection` now return a `ConnectingError` instead
  of `AuthenticationError`.

## Notes & open questions

While I've managed to trigger this error somewhat occasionally using a
program that races the closing with the completed handshake in a very
tight loop **before** applying this fix.  I'm completely failing to
trigger it since applying this fix, so I can admire the beautiful new
error reporting this fix should give.  It's a bit confusing.

**edit**: it **is** confusing. But it is correct. Because my flaky
failure does always close the connection *after* it completes the
handshake. So ALPN and EndpointId are always available. And then you
yield a valid `Connection` when awaiting an `Incoming`, it just is
already closed.

This fix could also be made against main. I believe the same commit
should be able to be cherry-picked and will probably apply fairly clean.
Do you think I should make it against main?

## Change checklist
<!-- Remove any that are not relevant. -->
- [x] Self-review.
- [x] All breaking changes documented.
- [x] List all breaking changes in the above "Breaking Changes" section.
There are issues for these already
…3664)

## Description

This means these tests also work when nextest run locally.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

I'm not sure why this wasn't done when the ci profile override was
chosen.  What am I missing?

## Change checklist
<!-- Remove any that are not relevant. -->
- [x] Self-review.
## Description

Bumps netwatch and netdev, to remove duplicate dependency on both
[email protected] and [email protected].

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
This is a next step into the world of configurable transports. We now
allow disabling the IP based transports entirely.
Internally this starts to prepare for a world where the user can
configure multiple different transports, IP, relay and others in the
future.

Closes #2957
## Description

Remove the test-only `Endpoint::path_selection` API and instead use
`Endpoint::clear_ip_transports` for `PathSelection::RelayOnly `, now
that this public API was added in
#3651.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
…moteState (#3673)

## Description

Renames:
* renamed `endpoint_map` -> `remote_map`, `EndpointMap` -> `RemoteMap`,
`endpoint_state` -> `remote_state`, `EndpointStateActor` ->
`RemoteStateActor`

Moved:
* moved `path_state` module under `remote_state` (prev
`endpoint_state`), its items are used only there and nowhere else

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)

---------

Co-authored-by: Floris Bruynooghe <[email protected]>
## Description

Merges main and adapts for the changes from #3619 

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)

---------

Co-authored-by: Rüdiger Klaehn <[email protected]>
Co-authored-by: Friedel Ziegelmayer <[email protected]>
## Description

Avoid potentially busy looping in a tokio task.
I think this blocking leads to tokio not being able to close the runtime
properly.
## Description

Fixes #3642 

This moves discovery handling fully into the `EndpointStateActor`.
The pub(crate) interface to trigger discovery and get a
EndpointMappedAddr is now `Magicsock::resolve_remote`, which sends the
provided addresses to the EndpointStateActor. The actor starts discovery
if it does not have a selected path and if discovery is not running. It
returns either immediately if there are any known paths, or waits for
discovery to produce at least one result or an error. Once this returns,
`resolve_remote` returns either with a EndpointMappedAddr or with the
discovery error.

This means the current behavior is kept: We only start
`quinn::Endpoint::connect` once we have at least one transport address
for the remote. If not, we return the discovery error immediately from
`iroh::Endpoint::connect`.

This opens the door for us to easily tune when to run discovery in other
siutations, e.g. when all available paths to a remote are closed.
However, for now this PR still only starts discovery when
`Endpoint::connect` is called and no path is selected at the moment.


## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
## Description

* fix idle timeout clear condition (previously it would hot loop)
* fix hot loop when local_addrs watchable becomes disconnected during
shutdown
* when sending a datagram fails in the transports sender, include the
dst address in the error message
* do not break the RemoteStateActor when sending a datagram fails

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)

---------

Co-authored-by: Philipp Krüger <[email protected]>
## Description

This reverts a change from this PR:
#3384

I originally thought I could make this test more reliable by pausing the
tokio time across the `tokio::time::timeout` calls, but it turns out
that actually makes the test *more* flaky:
- When time is paused, the timeout will immediately fire once the tokio
runtime has no more CPU work to do.
- It's possible that there's no CPU work to do anymore, while there's
something else that is actually still doing work, e.g. networking.
- Before the `ActiveRelayActor` finishes its `run_connected` loop, it
will call `client_sink.close().await`, which will do actual I/O. When
the tokio runtime is paused at that moment, it'll immediately trigger
the test's timeout.

## Notes & open questions

I couldn't reproduce this problem even across a couple thousand runs of
the test locally. I'm not super confident that this fixes things, but
I've analyzed the logs and this seems to be the most likely thing that's
happening to me.

Closes #3613 

## Change checklist
<!-- Remove any that are not relevant. -->
- [x] Self-review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

5 participants