-
Notifications
You must be signed in to change notification settings - Fork 339
fix(compactor): add mutex to protect Compactor.client from concurrent access #1086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(compactor): add mutex to protect Compactor.client from concurrent access #1086
Conversation
2e28e37 to
9779cf2
Compare
dd37d2f to
3ea579e
Compare
…nt access The Compactor.client field (a ReplicaClient interface) was read and written concurrently by goroutines spawned in Store.Open() without synchronization. One goroutine calls Compactor.Client() via the compaction level monitor while another calls Compactor.SetClient() via the snapshot level monitor, both through DB.ensureCompactorClient(). Add a sync.RWMutex to the Compactor struct. Methods that mutate the replica (Compact, EnforceSnapshotRetention, EnforceRetentionByTXID, EnforceL0Retention) hold a write lock for their full duration. Read-only methods (MaxLTXFileInfo, VerifyLevelConsistency, Client) hold a read lock. SetClient holds a write lock, blocking until all in-flight operations complete. Internal helpers (maxLTXFileInfo, verifyLevelConsistency) assume the caller already holds the lock. Fixes #1085 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3ea579e to
2cf4ebd
Compare
benbjohnson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corylanou This one seems weird. Do you know why we're updating the Compactor's client field at all? It seems like we should only be setting its client once immediately after the Replica is set. Or we could move compactor to the Replica instead of having it on DB and then it can be initialized at the same time.
That would be a better firx for sure. I wasn't sure if you needed the ability to set the client once it was created. It's possible that the "SetClient" is only for testing? If so, we should be able to hopefully change the way we test. This came up as a fuzz test failing a race condition. I'll take another look if we don't think we need the SetClient method. |
…ctorClient Replace the RWMutex-based approach with a simpler design per review feedback: the compactor client is now set once in DB.Open() when Replica is already assigned, eliminating the need for ensureCompactorClient(), SetClient(), Client(), and the mutex entirely. Tests that previously overwrote db.Replica after Open() are restructured to set Replica before Open(), ensuring the compactor gets the correct client during initialization. Fixes #1085 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Replica nil check to the validation block at the top of Open(), failing fast before any side effects. Remove the defensive nil guard around compactor client assignment since Replica is now guaranteed set. Fix createTestSQLiteDB to use database/sql directly instead of litestream.DB which requires a Replica. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move compactor.client assignment before monitor goroutine starts to eliminate potential race window - Add Replica.Client nil validation in Open() to fail fast - Remove redundant db.Replica overwrites in TestDB_Snapshot and TestDB_EnforceRetention (MustOpenDBs already sets a file replica) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@benbjohnson turns out the real fix was to change how we tested it. Removed all of the race conditions, etc. Much cleaner now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems way better 👍
Summary
Compactor.clientfield detected by-raceinFuzzRestoreWithMissingCompactedFileensureCompactorClient()lazily syncs the compactor's client on every operation, creating a race when multiple goroutines call it concurrently fromStore.Open()monitorsDB.Open()(whereReplicais already assigned), eliminatingensureCompactorClient(),SetClient(),Client(), and the mutex entirelydb.ReplicaafterOpen()are restructured to set Replica beforeOpen()Race trace (on main): Goroutine reading
Compactor.Client()via compaction level monitor races with goroutine writingCompactor.SetClient()via snapshot level monitor — both enter throughDB.ensureCompactorClient().Fixes #1085
Test plan
go test -race -count=5 -run FuzzRestoreWithMissingCompactedFile -v .— Compactor.client race eliminatedgo test -race ./...— full test suite passes🤖 Generated with Claude Code