Bug report
Bug description:
Summary
Modules/_decimal/_decimal.c performs unsynchronized read-modify-write and plain stores on the status and traps fields of mpd_context_t embedded in a Python decimal.Context. Whenever a Context instance is reachable from more than one free-threaded thread — explicitly via context= arguments, Context.<method>(…), ctx.flags[…] = …, etc. — those accesses race.
#141148 + #146482 fix the implicit sharing case (inherited context via contextvars). This issue is for the underlying primitive, which is independent of how the Context ended up shared and is still racy after #146482.
Affected sites
All in Modules/_decimal/_decimal.c:
| Line |
Code |
Reached from Python by |
| 616 |
ctx->status |= status; (in dec_addstatus) |
any arithmetic on the context |
| 617, 625, 629 |
reads of ctx->traps (in dec_addstatus trap path) |
same |
| 715 |
SdFlags(self) & flag (in signaldict_getitem) |
ctx.flags[X], ctx.traps[X] |
| 744 |
SdFlags(self) |= flag; (in signaldict_setitem) |
ctx.flags[X] = True, ctx.traps[X] = True |
| 747 |
SdFlags(self) &= ~flag; (in signaldict_setitem) |
ctx.flags[X] = False, ctx.traps[X] = False |
| 1407 |
CTX(self)->traps = 0; (in _decimal_Context_clear_traps_impl) |
ctx.clear_traps() |
| 1421 |
CTX(self)->status = 0; (in _decimal_Context_clear_flags_impl) |
ctx.clear_flags() |
SdFlags(v) is *v->flags where v->flags is bound to either &CTX(ctx)->status or &CTX(ctx)->traps (_decimal.c:1474–1475), so the signaldict paths are the same memory as the context-level paths via a different surface.
Triggering pattern
Any pure-Python code that shares one Context instance across free-threaded threads. Five minimal reproducers, one per site, are below. They use a barrier so the racing windows align on the first iteration; under TSan on a free-threaded debug build (./configure --disable-gil --with-thread-sanitizer) each one should reliably produce a data race report attributable to the matching site.
# common.py
import threading
N_THREADS = 8
ITERATIONS = 100_000
def run_concurrently(workers):
barrier = threading.Barrier(len(workers))
threads = [threading.Thread(target=w, args=(barrier,)) for w in workers]
for t in threads: t.start()
for t in threads: t.join()
1. dec_addstatus (:616)
# repro_status_or.py
import decimal
from common import N_THREADS, ITERATIONS, run_concurrently
SHARED = decimal.Context(prec=4) # prec=4 makes "1.23456" Inexact|Rounded
def worker(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.create_decimal("1.23456") # -> dec_addstatus(SHARED, ...)
run_concurrently([worker] * N_THREADS)
2. clear_flags race vs. dec_addstatus (:1421 ↔ :616)
# repro_clear_flags.py
import decimal
from common import ITERATIONS, run_concurrently
SHARED = decimal.Context(prec=4)
def producer(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.create_decimal("1.23456")
def clearer(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.clear_flags() # ctx->status = 0;
run_concurrently([producer]*4 + [clearer]*4)
3. clear_traps race vs. trap-detection read (:1407 ↔ :617)
# repro_clear_traps.py
import decimal
from common import ITERATIONS, run_concurrently
SHARED = decimal.Context(prec=4, traps=[decimal.Inexact])
def producer(barrier):
barrier.wait()
for _ in range(ITERATIONS):
try:
SHARED.create_decimal("1.23456") # reads ctx->traps
except decimal.Inexact:
pass
def clearer(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.clear_traps() # ctx->traps = 0;
run_concurrently([producer]*4 + [clearer]*4)
4. signaldict_setitem self-race (:744 / :747)
# repro_signaldict_set.py
import decimal
from common import ITERATIONS, run_concurrently
SHARED = decimal.Context(prec=28)
A, B = decimal.Inexact, decimal.Rounded # different bits, same word
def setter_a(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.flags[A] = True # |= bit_a
SHARED.flags[A] = False # &= ~bit_a
def setter_b(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.flags[B] = True
SHARED.flags[B] = False
run_concurrently([setter_a]*4 + [setter_b]*4)
print("final flags:", dict(SHARED.flags)) # observable lost-update on FT
(Substituting SHARED.traps for SHARED.flags produces the same race on ctx->traps.)
5. signaldict_getitem read vs. dec_addstatus write (:715 ↔ :616)
# repro_signaldict_get.py
import decimal
from common import ITERATIONS, run_concurrently
SHARED = decimal.Context(prec=4)
INEXACT = decimal.Inexact
def producer(barrier):
barrier.wait()
for _ in range(ITERATIONS):
SHARED.create_decimal("1.23456")
def reader(barrier):
barrier.wait()
for _ in range(ITERATIONS):
_ = SHARED.flags[INEXACT] # reads ctx->status non-atomically
run_concurrently([producer]*4 + [reader]*4)
Suggested fix
The cleanest free-threading-safe option is a per-Context PyMutex covering all reads and writes of status and traps. The fields are tiny (one uint32_t each) and accessed at very high frequency, so an alternative is to switch to _Py_atomic_or_uint32 / _Py_atomic_and_uint32 / _Py_atomic_load_uint32 / _Py_atomic_store_uint32 directly on the fields. Given that the trap-detection path needs to read traps and status together, atomics-only is a little awkward (the OR-then-test sequence wants both observations to be from the same logical state), so PyMutex is probably the better fit; the lock-free path can be reserved for the hot read in signaldict_getitem if profiling shows the mutex matters.
Either way, the fix should also cover the four CTX(...)->status = 0; resets at :1825, :1901, :1924, :1985 (in current_context_from_dict, PyDec_SetCurrentContext, init_current_context, and PyDec_SetCurrentContext for the contextvar variant) — those are stores into a Context that has just been created or just been swapped in, so they're not strictly racy in the current code, but if any future change exposes them earlier the same atomicity argument applies.
Related
Drafted by Claude Code, reviewed by a human.
CPython versions tested on:
CPython main branch, 3.15
Operating systems tested on:
Linux
Bug report
Bug description:
Summary
Modules/_decimal/_decimal.cperforms unsynchronized read-modify-write and plain stores on thestatusandtrapsfields ofmpd_context_tembedded in a Pythondecimal.Context. Whenever aContextinstance is reachable from more than one free-threaded thread — explicitly viacontext=arguments,Context.<method>(…),ctx.flags[…] = …, etc. — those accesses race.#141148 + #146482 fix the implicit sharing case (inherited context via
contextvars). This issue is for the underlying primitive, which is independent of how theContextended up shared and is still racy after #146482.Affected sites
All in
Modules/_decimal/_decimal.c:ctx->status |= status;(indec_addstatus)ctx->traps(indec_addstatustrap path)SdFlags(self) & flag(insignaldict_getitem)ctx.flags[X],ctx.traps[X]SdFlags(self) |= flag;(insignaldict_setitem)ctx.flags[X] = True,ctx.traps[X] = TrueSdFlags(self) &= ~flag;(insignaldict_setitem)ctx.flags[X] = False,ctx.traps[X] = FalseCTX(self)->traps = 0;(in_decimal_Context_clear_traps_impl)ctx.clear_traps()CTX(self)->status = 0;(in_decimal_Context_clear_flags_impl)ctx.clear_flags()SdFlags(v)is*v->flagswherev->flagsis bound to either&CTX(ctx)->statusor&CTX(ctx)->traps(_decimal.c:1474–1475), so the signaldict paths are the same memory as the context-level paths via a different surface.Triggering pattern
Any pure-Python code that shares one
Contextinstance across free-threaded threads. Five minimal reproducers, one per site, are below. They use a barrier so the racing windows align on the first iteration; under TSan on a free-threaded debug build (./configure --disable-gil --with-thread-sanitizer) each one should reliably produce a data race report attributable to the matching site.1.
dec_addstatus(:616)2.
clear_flagsrace vs.dec_addstatus(:1421↔:616)3.
clear_trapsrace vs. trap-detection read (:1407↔:617)4.
signaldict_setitemself-race (:744/:747)(Substituting
SHARED.trapsforSHARED.flagsproduces the same race onctx->traps.)5.
signaldict_getitemread vs.dec_addstatuswrite (:715↔:616)Suggested fix
The cleanest free-threading-safe option is a per-
ContextPyMutexcovering all reads and writes ofstatusandtraps. The fields are tiny (oneuint32_teach) and accessed at very high frequency, so an alternative is to switch to_Py_atomic_or_uint32/_Py_atomic_and_uint32/_Py_atomic_load_uint32/_Py_atomic_store_uint32directly on the fields. Given that the trap-detection path needs to readtrapsandstatustogether, atomics-only is a little awkward (the OR-then-test sequence wants both observations to be from the same logical state), soPyMutexis probably the better fit; the lock-free path can be reserved for the hot read insignaldict_getitemif profiling shows the mutex matters.Either way, the fix should also cover the four
CTX(...)->status = 0;resets at:1825,:1901,:1924,:1985(incurrent_context_from_dict,PyDec_SetCurrentContext,init_current_context, andPyDec_SetCurrentContextfor the contextvar variant) — those are stores into aContextthat has just been created or just been swapped in, so they're not strictly racy in the current code, but if any future change exposes them earlier the same atomicity argument applies.Related
getcontext()/current_context()inheritance path. After that PR lands, the primitive sites above are still racy whenever aContextis shared explicitly (e.g.Decimal(value, context=shared_ctx)orshared_ctx.create_decimal(s)); see the issue's own MRE for an explicit-share case that GH-141148: ensure tasks/threads get fresh copy of decimal.Context #146482 doesn't cover.Drafted by Claude Code, reviewed by a human.
CPython versions tested on:
CPython main branch, 3.15
Operating systems tested on:
Linux