-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
When updating to the latest go-redis driver, we hit a problem in our CI test suite where it showed up with a data race. We run our CI suite with the race detector enabled to catch problems. It triggered after updating the go-redis driver (v8.0.0-beta.7 to v8.4.0). It looks like it is a race bug inside the Redis driver and not in our code. We also were able to reproduce the race with a test script independent from our project.
Expected Behavior
func TestXReadRace(t *testing.T) {
client := redis.NewIntegrationRedis(t)
ctx, cancel := context.WithCancel(context.Background())
for i := 0; i < 100; i++ {
stream := fmt.Sprintf("foo.test.%d", i)
go func() {
_, err := client.XRead(ctx, &goredis.XReadArgs{
Streams: []string{stream, "$"},
Block: 1 * time.Second,
}).Result()
if err != context.Canceled {
panic(err)
}
}()
}
time.Sleep(10 * time.Millisecond)
cancel()
}
We'd expect that this operation wouldn't trigger the race detector, but that here err
returned from XRead
has the message that the context was cancelled.
Current Behavior
Right now sometimes the race detector triggers when this happens. It's not consistent, so multiple runs of the test script might be needed and the number of goroutines spawned might also need to be tweaked to reproduce this on a specific machine (too low didn't trigger for me on my laptop, but too high didn't trigger either).
dirkjan@dbussink ~/code/go-redis-race $ go test -v -count=1 -race
=== RUN TestXReadRace
--- PASS: TestXReadRace (0.03s)
PASS
==================
WARNING: DATA RACE
Write at 0x00c000249008 by goroutine 212:
github.com/go-redis/redis/v8.(*ClusterSlotsCmd).readReply.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1796 +0xb0
github.com/go-redis/redis/v8/internal/proto.(*Reader).ReadArrayReply()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/proto/reader.go:190 +0x2c1
github.com/go-redis/redis/v8.(*ClusterSlotsCmd).readReply()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1795 +0x64
github.com/go-redis/redis/v8.Cmder.readReply-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:25 +0x50
github.com/go-redis/redis/v8/internal/pool.(*Conn).WithReader.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/conn.go:73 +0x43e
github.com/go-redis/redis/v8/internal.WithSpan()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
github.com/go-redis/redis/v8/internal/pool.(*Conn).WithReader()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/conn.go:69 +0xa8
github.com/go-redis/redis/v8.(*baseClient).process.func1.1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:346 +0x1cf
github.com/go-redis/redis/v8.(*baseClient).withConn.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:319 +0x19b
github.com/go-redis/redis/v8/internal.WithSpan()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
github.com/go-redis/redis/v8.(*baseClient).withConn()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:303 +0x9c
github.com/go-redis/redis/v8.(*baseClient).process.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:338 +0xfc
github.com/go-redis/redis/v8/internal.WithSpan()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
github.com/go-redis/redis/v8.(*baseClient).process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:330 +0x164
github.com/go-redis/redis/v8.(*baseClient).process-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:324 +0x6d
github.com/go-redis/redis/v8.hooks.process.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
github.com/go-redis/redis/v8.hooks.withContext.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34
Previous read at 0x00c000249008 by goroutine 11:
github.com/go-redis/redis/v8.(*ClusterSlotsCmd).Val()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1783 +0x516
github.com/go-redis/redis/v8.(*ClusterSlotsCmd).Result()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1787 +0x509
github.com/go-redis/redis/v8.(*ClusterClient).loadState()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1015 +0x49e
github.com/go-redis/redis/v8.(*ClusterClient).loadState-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:988 +0x59
github.com/go-redis/redis/v8.(*clusterStateHolder).Reload()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:623 +0x5e
github.com/go-redis/redis/v8.(*clusterStateHolder).Get()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:655 +0x129
github.com/go-redis/redis/v8.(*ClusterClient).cmdNode()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1629 +0x90
github.com/go-redis/redis/v8.(*ClusterClient).process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:769 +0x972
github.com/go-redis/redis/v8.(*ClusterClient).process-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:753 +0x6d
github.com/go-redis/redis/v8.hooks.process.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
github.com/go-redis/redis/v8.hooks.withContext.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34
Goroutine 212 (running) created at:
github.com/go-redis/redis/v8.hooks.withContext()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0xb9
github.com/go-redis/redis/v8.hooks.process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:55 +0x737
github.com/go-redis/redis/v8.(*Client).Process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:609 +0x139
github.com/go-redis/redis/v8.(*Client).Process-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:608 +0x6d
github.com/go-redis/redis/v8.cmdable.ClusterSlots()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/commands.go:2540 +0x1ae
github.com/go-redis/redis/v8.(*ClusterClient).loadState()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1015 +0x4f9
github.com/go-redis/redis/v8.(*ClusterClient).loadState-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:988 +0x59
github.com/go-redis/redis/v8.(*clusterStateHolder).Reload()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:623 +0x5e
github.com/go-redis/redis/v8.(*clusterStateHolder).Get()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:655 +0x129
github.com/go-redis/redis/v8.(*ClusterClient).cmdNode()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1629 +0x90
github.com/go-redis/redis/v8.(*ClusterClient).process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:769 +0x972
github.com/go-redis/redis/v8.(*ClusterClient).process-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:753 +0x6d
github.com/go-redis/redis/v8.hooks.process.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
github.com/go-redis/redis/v8.hooks.withContext.func1()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34
Goroutine 11 (finished) created at:
github.com/go-redis/redis/v8.hooks.withContext()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0xb9
github.com/go-redis/redis/v8.hooks.process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:55 +0x737
github.com/go-redis/redis/v8.(*ClusterClient).Process()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:750 +0x10a
github.com/go-redis/redis/v8.(*ClusterClient).Process-fm()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:749 +0x6d
github.com/go-redis/redis/v8.cmdable.XRead()
/Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/commands.go:1572 +0x7dc
github.com/dbussink/go-redis-race_test.TestXReadRace.func1()
/Users/dirkjan/code/go-redis-race/go_redis_race_test.go:24 +0x12d
==================
Found 1 data race(s)
exit status 66
FAIL github.com/dbussink/go-redis-race 0.136s
To be specific, this problem only seems to be happening when using Redis cluster. We have been unable to reproduce the problem running this script against a non cluster environment.
Possible Solution
All the problems here seem to be due to context cancellation where it then races with storing the result. Not sure what the right fix would be, but perhaps the errors of the context need to be checked first before trying to write the error? That would still be racy with SetErr
though potentially, so SetErr
should somehow also be completed before the blocking command returns to the caller?
Steps to Reproduce
Running the above test should reproduce the problem. The problem happens here when the pending xread
command gets cancelled. It doesn't always happen on each run for this script, but when it does, it fails with the above backtraces.
Context (Environment)
The code here is used during shutdown of our service when various XRead
commands are still pending, but we need to restart for example for an update / new deployment. We don't run the race detector in production, but do use it in CI where it then triggers this bug sometimes.
I suspect that #1500 was reporting a similar problem, but that one was closed due to lack of useful input. But we're seen the race detector also trigger on SetErr
as well as described in that issue.