Skip to content

Conversation

@joshua-kim
Copy link
Contributor

@joshua-kim joshua-kim commented Aug 8, 2025

Why this should be merged

The latest version of golangci fails on this linter. This linter enforces the usage of newer context function signatures.

How this works

Fixes linter by adding contexts where appropriate.

How this was tested

CI

Need to be documented in RELEASES.md?

No

@joshua-kim joshua-kim force-pushed the noctx branch 3 times, most recently from b864fce to 2e04ac1 Compare August 8, 2025 18:01
@joshua-kim joshua-kim moved this to In Progress 🏗️ in avalanchego Aug 8, 2025
Comment on lines +20 to +21
func Lookup(ctx context.Context, hostname string) (netip.Addr, error) {
ips, err := (&net.Resolver{}).LookupIPAddr(ctx, hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change 👍

@joshua-kim joshua-kim self-assigned this Aug 11, 2025
@joshua-kim joshua-kim force-pushed the noctx branch 2 times, most recently from 10f5144 to 03dd8ca Compare August 13, 2025 23:03
@github-actions
Copy link

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@github-actions github-actions bot added the lifecycle/stale Inactive for 60 days label Sep 14, 2025
@joshua-kim joshua-kim moved this from In Progress 🏗️ to Ready 🚦 in avalanchego Sep 15, 2025
@joshua-kim joshua-kim removed the lifecycle/stale Inactive for 60 days label Sep 19, 2025
// Invariant: There must only be one peer running at a time with a reference to
// the same [config.InboundMsgThrottler].
func Start(
ctx context.Context,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these functions are previously expected to run until there's an error or network shutdown adding a context to the signature means we need to be careful about what context gets passed in and whether cancelling it could cancel what are supposed to be long-running functions. Looking through the networking package for any case where this could break assumptions.

}

if err := p.Network.Track(discoveredIPs); err != nil {
if err := p.Network.Track(ctx, discoveredIPs); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx that gets passed in here comes all the way from Start along the following path:

Start:

ctx context.Context,

go p.readMessages:
go p.readMessages(ctx)

handle:
p.handle(ctx, msg)

handlePeerList:
p.handlePeerList(ctx, m)

Track:
if err := p.Network.Track(ctx, discoveredIPs); err != nil {

Passing an expired context into Track seems like it could be problematic because we use it in an async context to establish/upgrade peer connections, which may error if that context is cancelled.

At the same time, the caller may pass in a context and defer cancellation until that function returns, which could incorrectly cancel any of these goroutines to create a connection.

I thought this may create a subtle change in invariants for StartTestPeer (which to my knowledge is still used by the relayer in production), when first looking at this, but it seems that uses TestNetwork which is a no-op for Track and wouldn't be the case.

Still this code seems fairly sensitive to an accidental, subtle change in invariants by switching from a context that is scoped to the duration of a function to being used in a long-lived async manner.

Would it make sense to exclude this package from this linter for a first pass?

Copy link
Contributor Author

@joshua-kim joshua-kim Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing an expired context into Track seems like it could be problematic because we use it in an async context to establish/upgrade peer connections, which may error if that context is cancelled.

Why would an error here be concerning? If a peer errors we would close our connection and contexts are only canceled on shutdown anyway.

At the same time, the caller may pass in a context and defer cancellation until that function returns, which could incorrectly cancel any of these goroutines to create a connection.

This would be improper usage of the context. If you have a long-running goroutine, you shouldn't be canceling the context. You would probably want to hold onto the cancel function and cancel it later.

FWIW, the root parent context passed into avalanchego is never canceled.

Would it make sense to exclude this package from this linter for a first pass?

Wouldn't we still have to resolve this linter in this package anyways? Either we would have a bunch of nolint overrides, or we would have to apply this diff anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this package and the new test peer used externally as well? It's less than ideal, but my understanding is that StartTestPeer has external consumers that depend on this and this code change would subtly change the behavior such that the previous code will handle it fine if you cancel a context after the function returns and this code may change that behavior. So if we make this change we need to double check that those external consumers do not misuse this API.

Copy link
Contributor Author

@joshua-kim joshua-kim Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. StartTestPeer is not used by the relayer - TestNetwork is (ref).
  2. This PR adds the contexts to the Network type. If there is misuse of the contexts, it'll be when the relayer upgrades to the next version of avalanchego. It's not possible for the relayer to have existing misuse of the context since it's not on the Network api yet.
  3. Additionally, this PR doesn't expose contexts on the TestNetwork type anyways - we hardcode context.Background.

@aaronbuchwald
Copy link
Collaborator

We may want to exclude then network package from this change or split it out to a follow up as there's some undocumented behavior and potential for accidentally promoting a context scoped to the duration of a function to using a context for long-lived goroutines.

joshua-kim and others added 2 commits October 3, 2025 12:51
Co-authored-by: Stephen Buttolph <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
@github-actions
Copy link

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@github-actions github-actions bot added the lifecycle/stale Inactive for 60 days label Nov 16, 2025
@joshua-kim joshua-kim removed the lifecycle/stale Inactive for 60 days label Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready 🚦

Development

Successfully merging this pull request may close these issues.

5 participants