Skip to content

Conversation

@Frando
Copy link
Member

@Frando Frando commented Nov 20, 2025

Description

This adds a simple test: one server that accepts connections, and then 3 clients that are created consecutively and each connect to the server, exchange data, close connection, close endpoint. After a client is finished the next client is created (new endpoint, new secret key).

The test comes in three variations that share their impl:

  • test_one_server_two_clients_local_relay: uses a local relay
  • test_one_server_two_clients_public_relay: uses the public relay
  • test_one_server_two_clients_no_relay: doesn't use relays at all

The no relay and local relay versions pass for me. However the public relay one times out but only for the second client. The first one succeeds, the second connect attempt times out at the client.

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@Frando Frando changed the base branch from main to feat-multipath November 20, 2025 17:17
@github-actions
Copy link

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3689/docs/iroh/

Last updated: 2025-11-20T17:23:30Z

@github-actions
Copy link

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 2c0802e

@n0bot n0bot bot added this to iroh Nov 20, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Nov 20, 2025
@Frando
Copy link
Member Author

Frando commented Nov 20, 2025

initial investigation for test_one_server_two_clients_public_relay:
once the second client starts to connect, I do get the logs for it sending datagrams via the RemoteStateActor on all paths (ip and relay). on the server, when browsing the logs from that send timestamp forward, however I get these logs only:

2025-11-20T20:54:02.226494Z TRACE server: netwatch::udp: UDP recv src=192.168.18.55:32827 len=1200 count=1 dst=192.168.18.55
2025-11-20T20:54:02.226522Z TRACE server: iroh::magicsock: UDP recv: 1 packets
2025-11-20T20:54:02.554427Z TRACE server:relay-actor:active-relay{url=https://euc1-1.relay.n0.iroh-canary.iroh.link./}:connected: iroh::magicsock::transports::relay::actor: received msg len=1200
2025-11-20T20:54:02.554634Z TRACE server: iroh::magicsock::mapped_addrs: generated new addr addr=RelayMappedAddr(fd15:70a:510b:1::3) key=(RelayUrl("https://euc1-1.relay.n0.iroh-canary.iroh.link./"), PublicKey(5965936c12b5b298af175fed203e78ef0f173a60786efb89575464463f9cdea7))

and .. that's it? I get some further logs for a connection that is can be identified as the older (closed!) connection from the first client.

I don't see any quinn_proto recv lines for this datagram received from the second client, which is weird - it seems these initial packets are received by the server, get up to the point where "generated new addr" is logged, but don't make its way into the quinn endpoint.

@Frando
Copy link
Member Author

Frando commented Nov 20, 2025

Interestingly, CI fails only for ubuntu-stable features none or default, all others pass! I can reproduce the failure reliably locally though.

@Frando
Copy link
Member Author

Frando commented Nov 22, 2025

Some findings. Looking at the logs of the test with the public relay.
First client works fine. Now second client starts. Looking at the logs of the server for the second client:

  • Client initial packet arrives over the IP path quinn_proto::Endpoint::handle emits Some(DatagramEvent::NewConnection).
  • Client initial packet arrives over the relay path, Endpoint::handle emits None, internally RouteTo::Initial(0) is the case that leads to handle returning None
  • Client packet packet arrives over the IP. Strangely, quinn_proto::Endpoint::handle now returns None instead of a DatagramEvent::ConnectionEvent, in handle it hits again the RouteTo::Initial(0) case.

Comparing to the first successfull flow, in the first client attempt we have this flow:

  • First packet over IP arrives -> DatagramEvent::NewConnection
  • Second packet over IP arrives -> DatagramEvent::ConnectionEvent

So in the working case the second packet over IP also has a different dst_cid than the first packet, whereas in the second (broken) client the second packet over IP has the same dst_cid than the first client.

Attaching logs here (with additional debug log statements inserted by me in iroh_quinn).
log-public.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

2 participants