Skip to content

Race condition in context cancellation for cluster commands #1568

@dbussink

Description

@dbussink

When updating to the latest go-redis driver, we hit a problem in our CI test suite where it showed up with a data race. We run our CI suite with the race detector enabled to catch problems. It triggered after updating the go-redis driver (v8.0.0-beta.7 to v8.4.0). It looks like it is a race bug inside the Redis driver and not in our code. We also were able to reproduce the race with a test script independent from our project.

Expected Behavior

func TestXReadRace(t *testing.T) {
	client := redis.NewIntegrationRedis(t)
	ctx, cancel := context.WithCancel(context.Background())

	for i := 0; i < 100; i++ {
		stream := fmt.Sprintf("foo.test.%d", i)
		go func() {
			_, err := client.XRead(ctx, &goredis.XReadArgs{
				Streams: []string{stream, "$"},
				Block:   1 * time.Second,
			}).Result()
			if err != context.Canceled {
				panic(err)
			}
		}()
	}

	time.Sleep(10 * time.Millisecond)
	cancel()
}

We'd expect that this operation wouldn't trigger the race detector, but that here err returned from XRead has the message that the context was cancelled.

Current Behavior

Right now sometimes the race detector triggers when this happens. It's not consistent, so multiple runs of the test script might be needed and the number of goroutines spawned might also need to be tweaked to reproduce this on a specific machine (too low didn't trigger for me on my laptop, but too high didn't trigger either).

dirkjan@dbussink ~/code/go-redis-race $ go test -v -count=1 -race
=== RUN   TestXReadRace
--- PASS: TestXReadRace (0.03s)
PASS
==================
WARNING: DATA RACE
Write at 0x00c000249008 by goroutine 212:
  github.com/go-redis/redis/v8.(*ClusterSlotsCmd).readReply.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1796 +0xb0
  github.com/go-redis/redis/v8/internal/proto.(*Reader).ReadArrayReply()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/proto/reader.go:190 +0x2c1
  github.com/go-redis/redis/v8.(*ClusterSlotsCmd).readReply()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1795 +0x64
  github.com/go-redis/redis/v8.Cmder.readReply-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:25 +0x50
  github.com/go-redis/redis/v8/internal/pool.(*Conn).WithReader.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/conn.go:73 +0x43e
  github.com/go-redis/redis/v8/internal.WithSpan()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
  github.com/go-redis/redis/v8/internal/pool.(*Conn).WithReader()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/conn.go:69 +0xa8
  github.com/go-redis/redis/v8.(*baseClient).process.func1.1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:346 +0x1cf
  github.com/go-redis/redis/v8.(*baseClient).withConn.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:319 +0x19b
  github.com/go-redis/redis/v8/internal.WithSpan()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
  github.com/go-redis/redis/v8.(*baseClient).withConn()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:303 +0x9c
  github.com/go-redis/redis/v8.(*baseClient).process.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:338 +0xfc
  github.com/go-redis/redis/v8/internal.WithSpan()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/util.go:59 +0x296
  github.com/go-redis/redis/v8.(*baseClient).process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:330 +0x164
  github.com/go-redis/redis/v8.(*baseClient).process-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:324 +0x6d
  github.com/go-redis/redis/v8.hooks.process.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
  github.com/go-redis/redis/v8.hooks.withContext.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34

Previous read at 0x00c000249008 by goroutine 11:
  github.com/go-redis/redis/v8.(*ClusterSlotsCmd).Val()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1783 +0x516
  github.com/go-redis/redis/v8.(*ClusterSlotsCmd).Result()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/command.go:1787 +0x509
  github.com/go-redis/redis/v8.(*ClusterClient).loadState()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1015 +0x49e
  github.com/go-redis/redis/v8.(*ClusterClient).loadState-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:988 +0x59
  github.com/go-redis/redis/v8.(*clusterStateHolder).Reload()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:623 +0x5e
  github.com/go-redis/redis/v8.(*clusterStateHolder).Get()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:655 +0x129
  github.com/go-redis/redis/v8.(*ClusterClient).cmdNode()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1629 +0x90
  github.com/go-redis/redis/v8.(*ClusterClient).process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:769 +0x972
  github.com/go-redis/redis/v8.(*ClusterClient).process-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:753 +0x6d
  github.com/go-redis/redis/v8.hooks.process.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
  github.com/go-redis/redis/v8.hooks.withContext.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34

Goroutine 212 (running) created at:
  github.com/go-redis/redis/v8.hooks.withContext()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0xb9
  github.com/go-redis/redis/v8.hooks.process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:55 +0x737
  github.com/go-redis/redis/v8.(*Client).Process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:609 +0x139
  github.com/go-redis/redis/v8.(*Client).Process-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:608 +0x6d
  github.com/go-redis/redis/v8.cmdable.ClusterSlots()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/commands.go:2540 +0x1ae
  github.com/go-redis/redis/v8.(*ClusterClient).loadState()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1015 +0x4f9
  github.com/go-redis/redis/v8.(*ClusterClient).loadState-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:988 +0x59
  github.com/go-redis/redis/v8.(*clusterStateHolder).Reload()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:623 +0x5e
  github.com/go-redis/redis/v8.(*clusterStateHolder).Get()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:655 +0x129
  github.com/go-redis/redis/v8.(*ClusterClient).cmdNode()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:1629 +0x90
  github.com/go-redis/redis/v8.(*ClusterClient).process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:769 +0x972
  github.com/go-redis/redis/v8.(*ClusterClient).process-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:753 +0x6d
  github.com/go-redis/redis/v8.hooks.process.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:56 +0x98
  github.com/go-redis/redis/v8.hooks.withContext.func1()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0x34

Goroutine 11 (finished) created at:
  github.com/go-redis/redis/v8.hooks.withContext()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:139 +0xb9
  github.com/go-redis/redis/v8.hooks.process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/redis.go:55 +0x737
  github.com/go-redis/redis/v8.(*ClusterClient).Process()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:750 +0x10a
  github.com/go-redis/redis/v8.(*ClusterClient).Process-fm()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/cluster.go:749 +0x6d
  github.com/go-redis/redis/v8.cmdable.XRead()
      /Users/dirkjan/go/pkg/mod/github.com/go-redis/redis/[email protected]/commands.go:1572 +0x7dc
  github.com/dbussink/go-redis-race_test.TestXReadRace.func1()
      /Users/dirkjan/code/go-redis-race/go_redis_race_test.go:24 +0x12d
==================
Found 1 data race(s)
exit status 66
FAIL	github.com/dbussink/go-redis-race	0.136s

To be specific, this problem only seems to be happening when using Redis cluster. We have been unable to reproduce the problem running this script against a non cluster environment.

Possible Solution

All the problems here seem to be due to context cancellation where it then races with storing the result. Not sure what the right fix would be, but perhaps the errors of the context need to be checked first before trying to write the error? That would still be racy with SetErr though potentially, so SetErr should somehow also be completed before the blocking command returns to the caller?

Steps to Reproduce

Running the above test should reproduce the problem. The problem happens here when the pending xread command gets cancelled. It doesn't always happen on each run for this script, but when it does, it fails with the above backtraces.

Context (Environment)

The code here is used during shutdown of our service when various XRead commands are still pending, but we need to restart for example for an update / new deployment. We don't run the race detector in production, but do use it in CI where it then triggers this bug sometimes.

I suspect that #1500 was reporting a similar problem, but that one was closed due to lack of useful input. But we're seen the race detector also trigger on SetErr as well as described in that issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions