Reproduce split race/crash by ccotter · Pull Request #1395 · NVIDIA/stdexec

ccotter · 2024-08-11T03:18:37Z

https://gist.github.com/ccotter/c0504455c7d54240758a34dc99d620f0 is the ASAN report.

From

stdexec/include/stdexec/__detail/__shared.hpp

Line 247 in 54b38c9

auto __old = __self->__dec_ref();

the shared state becomes a dangling pointer if if

stdexec/include/stdexec/__detail/__shared.hpp

Line 346 in 54b38c9

this->__dec_ref(); // release the extra ref count, deletes this

frees the memory from another thread.

I'm not quite sure how to fix this, but this PR manages to reproduce this bug reliably with ASAN in the "split forwards results from a different thread"" test.

copy-pr-bot · 2024-08-11T03:18:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ccotter · 2024-08-11T03:26:59Z

I naively tried to make a copy of the intrusive_ptr inside __detach to keep the ref count positive long enough to ensure the object is kept alive, but I wasn't able to get that to work.

ericniebler · 2024-08-12T04:38:56Z

A million thanks for this, Chris. I'll investigate as soon as I can, but probably not before next week. Big presentation coming up.

ericniebler · 2024-12-31T02:43:01Z

I think #1460 fixes this

@dvyukov

This commit introduces an "adaptive delay" feature to the ThreadSanitizer runtime to improve race detection by perturbing thread schedules. At various synchronization points (atomic operations, mutexes, and thread lifecycle events), the runtime may inject small delays (spin loops, yields, or sleeps) to explore different thread interleavings and expose data races that would otherwise occur only in rare execution orders. This change is inspired by prior work, which is discussed in more detail on https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969. In short, https://reviews.llvm.org/D65383 was an earlier unmerged attempt at adding a random delays. Feedback on the RFC led to the version in this commit, aiming to limit the amount of delay. The adaptive delay feature uses a configurable time budget and tiered sampling strategy to balance race exposure against performance impact. It prioritizes high-value synchronization points with clear happens-before relationships: relaxed atomics receive lightweight spin delays with low sampling, synchronizing atomics (acquire / release / seq_cst) receive moderate delays with higher sampling, and mutex and thread lifecycle operations receive the longest delays with highest sampling. The feature is disabled by default and incurs minimal overhead when not enabled. Nearly all checks are guarded by an inline check on a global variable that is only set when enable_adaptive_delay=1. Microbenchmarks with tight loops of atomic operations showed no meaningful performance difference between an unmodified TSAN runtime and this version when running with empty TSAN_OPTIONS. An LLM assisted in writing portions of the adaptive delay logic, including the TimeBudget class, tiering concept, address sampler, and per-thread quota system. I reviewed the output and made amendments to reduce duplication and simplify the behavior. I also replaced the LLM's original double-based calculation logic with the integer-based Percent class. The LLM also helped write unit test cases for Percent. cc @dvyukov ## Examples I used the delay scheduler to find novel bugs that rarely or never occurred with the unmodified TSAN runtime. Some of the bugs below were found with earlier versions of the delay scheduler that I iterated on, but with this most recent implementation in this PR, I can still find the bugs far more reliably than with the standard TSAN runtime. - A use-after-free in the [BlazingMQ](https://github.com/bloomberg/blazingmq) broker during ungraceful producer disconnect. - Race in stdexec: NVIDIA/stdexec#1395 - Race in stdexec's MPSC queue: NVIDIA/stdexec#1812 - A few races in [BDE](https://github.com/bloomberg/bde) thread enabled data structures/algorithms. - The "Data race on variable a" test from https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with more aggressive adaptive scheduler options # Outstanding work - The [RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969) suggests moving the scheduler to sanitizer_common, so that ASAN can leverage this. This should be done (should it be done in this PR?). - Missing interceptors for libdispatch

@dvyukov

This commit introduces an "adaptive delay" feature to the ThreadSanitizer runtime to improve race detection by perturbing thread schedules. At various synchronization points (atomic operations, mutexes, and thread lifecycle events), the runtime may inject small delays (spin loops, yields, or sleeps) to explore different thread interleavings and expose data races that would otherwise occur only in rare execution orders. This change is inspired by prior work, which is discussed in more detail on https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969. In short, https://reviews.llvm.org/D65383 was an earlier unmerged attempt at adding a random delays. Feedback on the RFC led to the version in this commit, aiming to limit the amount of delay. The adaptive delay feature uses a configurable time budget and tiered sampling strategy to balance race exposure against performance impact. It prioritizes high-value synchronization points with clear happens-before relationships: relaxed atomics receive lightweight spin delays with low sampling, synchronizing atomics (acquire / release / seq_cst) receive moderate delays with higher sampling, and mutex and thread lifecycle operations receive the longest delays with highest sampling. The feature is disabled by default and incurs minimal overhead when not enabled. Nearly all checks are guarded by an inline check on a global variable that is only set when enable_adaptive_delay=1. Microbenchmarks with tight loops of atomic operations showed no meaningful performance difference between an unmodified TSAN runtime and this version when running with empty TSAN_OPTIONS. An LLM assisted in writing portions of the adaptive delay logic, including the TimeBudget class, tiering concept, address sampler, and per-thread quota system. I reviewed the output and made amendments to reduce duplication and simplify the behavior. I also replaced the LLM's original double-based calculation logic with the integer-based Percent class. The LLM also helped write unit test cases for Percent. cc @dvyukov ## Examples I used the delay scheduler to find novel bugs that rarely or never occurred with the unmodified TSAN runtime. Some of the bugs below were found with earlier versions of the delay scheduler that I iterated on, but with this most recent implementation in this PR, I can still find the bugs far more reliably than with the standard TSAN runtime. - A use-after-free in the [BlazingMQ](https://github.com/bloomberg/blazingmq) broker during ungraceful producer disconnect. - Race in stdexec: NVIDIA/stdexec#1395 - Race in stdexec's MPSC queue: NVIDIA/stdexec#1812 - A few races in [BDE](https://github.com/bloomberg/bde) thread enabled data structures/algorithms. - The "Data race on variable a" test from https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with more aggressive adaptive scheduler options # Outstanding work - The [RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969) suggests moving the scheduler to sanitizer_common, so that ASAN can leverage this. This should be done (should it be done in this PR?). - Missing interceptors for libdispatch

@dvyukov

This commit introduces an "adaptive delay" feature to the ThreadSanitizer runtime to improve race detection by perturbing thread schedules. At various synchronization points (atomic operations, mutexes, and thread lifecycle events), the runtime may inject small delays (spin loops, yields, or sleeps) to explore different thread interleavings and expose data races that would otherwise occur only in rare execution orders. This change is inspired by prior work, which is discussed in more detail on https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969. In short, https://reviews.llvm.org/D65383 was an earlier unmerged attempt at adding a random delays. Feedback on the RFC led to the version in this commit, aiming to limit the amount of delay. The adaptive delay feature uses a configurable time budget and tiered sampling strategy to balance race exposure against performance impact. It prioritizes high-value synchronization points with clear happens-before relationships: relaxed atomics receive lightweight spin delays with low sampling, synchronizing atomics (acquire / release / seq_cst) receive moderate delays with higher sampling, and mutex and thread lifecycle operations receive the longest delays with highest sampling. The feature is disabled by default and incurs minimal overhead when not enabled. Nearly all checks are guarded by an inline check on a global variable that is only set when enable_adaptive_delay=1. Microbenchmarks with tight loops of atomic operations showed no meaningful performance difference between an unmodified TSAN runtime and this version when running with empty TSAN_OPTIONS. An LLM assisted in writing portions of the adaptive delay logic, including the TimeBudget class, tiering concept, address sampler, and per-thread quota system. I reviewed the output and made amendments to reduce duplication and simplify the behavior. I also replaced the LLM's original double-based calculation logic with the integer-based Percent class. The LLM also helped write unit test cases for Percent. cc @dvyukov ## Examples I used the delay scheduler to find novel bugs that rarely or never occurred with the unmodified TSAN runtime. Some of the bugs below were found with earlier versions of the delay scheduler that I iterated on, but with this most recent implementation in this PR, I can still find the bugs far more reliably than with the standard TSAN runtime. - A use-after-free in the [BlazingMQ](https://github.com/bloomberg/blazingmq) broker during ungraceful producer disconnect. - Race in stdexec: NVIDIA/stdexec#1395 - Race in stdexec's MPSC queue: NVIDIA/stdexec#1812 - A few races in [BDE](https://github.com/bloomberg/bde) thread enabled data structures/algorithms. - The "Data race on variable a" test from https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with more aggressive adaptive scheduler options # Outstanding work - The [RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969) suggests moving the scheduler to sanitizer_common, so that ASAN can leverage this. This should be done (should it be done in this PR?). - Missing interceptors for libdispatch

Reproduce split race/crash

1717765

ccotter mentioned this pull request Aug 21, 2024

Proof of concept: Relacy tests #1400

Merged

ericniebler mentioned this pull request Dec 31, 2024

fix long-standing race condition in split #1460

Merged

ericniebler closed this Jan 2, 2025

ccotter mentioned this pull request Jan 30, 2026

[tsan] Introduce Adaptive Delay Scheduling to TSAN llvm/llvm-project#178836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce split race/crash#1395

Reproduce split race/crash#1395
ccotter wants to merge 1 commit intoNVIDIA:mainfrom
ccotter:split-bug-reproduce

ccotter commented Aug 11, 2024

Uh oh!

copy-pr-bot bot commented Aug 11, 2024

Uh oh!

ccotter commented Aug 11, 2024

Uh oh!

ericniebler commented Aug 12, 2024

Uh oh!

ericniebler commented Dec 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ccotter commented Aug 11, 2024

Uh oh!

copy-pr-bot bot commented Aug 11, 2024

Uh oh!

ccotter commented Aug 11, 2024

Uh oh!

ericniebler commented Aug 12, 2024

Uh oh!

ericniebler commented Dec 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants