Fix duplicate CTE race condition in ephemeral model compilation#12602
Fix duplicate CTE race condition in ephemeral model compilation#12602colin-rogers-dbt wants to merge 7 commits intomainfrom
Conversation
Add per-node threading.Lock to CompiledResource to prevent duplicate CTEs when multiple threads compile nodes that ref() the same ephemeral model. The existing check-then-act patterns in _recursively_prepend_ctes and set_cte were classic TOCTOU races causing duplicate CTEs ~1/50 runs. Per-node lock granularity preserves parallelism — threads only block when competing for the exact same ephemeral node. No deadlock risk since ephemeral deps form a DAG and lock acquisition follows dependency order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Additional Artifact Review RequiredChanges to artifact directory files requires at least 2 approvals from core team members. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #12602 +/- ##
==========================================
- Coverage 91.39% 91.32% -0.07%
==========================================
Files 203 203
Lines 25596 25644 +48
==========================================
+ Hits 23394 23420 +26
- Misses 2202 2224 +22
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Add __getstate__/__setstate__ to CompiledResource so the _lock field is excluded during pickling and recreated on unpickle, fixing TypeError: cannot pickle '_thread.lock' object. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QMalcolm
left a comment
There was a problem hiding this comment.
To comments, but logically this all looks good!
| extra_ctes_injected: bool = False | ||
| extra_ctes: List[InjectedCTE] = field(default_factory=list) | ||
| _pre_injected_sql: Optional[str] = None | ||
| _lock: threading.Lock = field( |
There was a problem hiding this comment.
😅 I think there is now a better way to do this. If you instead define _lock on the runtime instantiated version of CompiledResource, i.e. CompiledNode, you don't have to worry about the serialization aspect of it because extra attributes will be dropped when converting back CompiledResource during serialization.
I only know this because we discovered this during the development of microbatch. Here are some attributes we did it for (link)
Address PR feedback: define _lock on the runtime CompiledNode class instead of the serialized CompiledResource. This avoids all mashumaro serialization concerns since extra attributes on CompiledNode are automatically dropped during serialization. Also restores the dropped comment explaining the check-then-act pattern, and removes the unnecessary lock from set_cte (only called with sql=None during parsing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the recursive _recursively_prepend_ctes call outside the cte_model._lock scope. The old structure would deadlock with threading.Lock (non-reentrant) because the recursive call re-acquires the same node's lock for CTE injection. Now the lock only covers the compile-or-skip decision, and recursion happens after release. Add test that validates the lock acquisition pattern doesn't deadlock by simulating the exact sequence from _recursively_prepend_ctes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
threading.LocktoCompiledResourceto eliminate TOCTOU race conditions when multiple threads compile nodes thatref()the same ephemeral model_recursively_prepend_ctes(ephemeral compilation and CTE injection) with the ephemeral/consuming node's lockset_ctethread-safe to prevent duplicate CTE appendsTest plan
tests/unit/test_compilation_threading.pywith 4 regression tests:set_ctededuplication (20 threads, assert single CTE)tests/unit/contracts/graph/test_nodes.pytests passhatch run pre-commit run --all-files)🤖 Generated with Claude Code