Skip to content

Conversation

@rafiss
Copy link
Collaborator

@rafiss rafiss commented Sep 30, 2021

fixes #70019
fixes #69545
fixes #69523

We have seen issues where a worker returns ErrBadConn immediately after
the ramp period is done. It comes down to the fact that connections in
the pool are marked as "bad" when a context is canceled.

The underlying issue might be a race condition in how lib/pq cancels the
context and then marks the connection as "bad." But I'm not really sure
how to fix that and it's hard to reproduce, so I'm working around the
problem instead.

Release note: None

We have seen issues where a worker returns ErrBadConn immediately after
the ramp period is done. It comes down to the fact that connections in
the pool are makred as "bad" when a context is canceled.

The underlying issue might be a race condition in how lib/pq cancels the
context and then marks the connection as "bad." But I'm not really sure
how to fix that and it's hard to reproduce, so I'm working around the
problem instead.

Release note: None
@rafiss rafiss requested a review from a team September 30, 2021 19:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss requested review from otan and tbg September 30, 2021 19:43
@rafiss
Copy link
Collaborator Author

rafiss commented Oct 5, 2021

bors r=otan,tbg

@craig
Copy link
Contributor

craig bot commented Oct 5, 2021

Build succeeded:

@craig craig bot merged commit cf4fe62 into cockroachdb:master Oct 5, 2021
@rafiss rafiss deleted the debug-workload-badconn branch October 5, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants