Skip to content

Reproduce split race/crash#1395

Closed
ccotter wants to merge 1 commit intoNVIDIA:mainfrom
ccotter:split-bug-reproduce
Closed

Reproduce split race/crash#1395
ccotter wants to merge 1 commit intoNVIDIA:mainfrom
ccotter:split-bug-reproduce

Conversation

@ccotter
Copy link
Contributor

@ccotter ccotter commented Aug 11, 2024

https://gist.github.com/ccotter/c0504455c7d54240758a34dc99d620f0 is the ASAN report.

From

auto __old = __self->__dec_ref();
the shared state becomes a dangling pointer if if
this->__dec_ref(); // release the extra ref count, deletes this
frees the memory from another thread.

I'm not quite sure how to fix this, but this PR manages to reproduce this bug reliably with ASAN in the "split forwards results from a different thread"" test.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 11, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ccotter
Copy link
Contributor Author

ccotter commented Aug 11, 2024

I naively tried to make a copy of the intrusive_ptr inside __detach to keep the ref count positive long enough to ensure the object is kept alive, but I wasn't able to get that to work.

@ericniebler
Copy link
Collaborator

A million thanks for this, Chris. I'll investigate as soon as I can, but probably not before next week. Big presentation coming up.

@ericniebler
Copy link
Collaborator

I think #1460 fixes this

@ericniebler ericniebler closed this Jan 2, 2025
dvyukov pushed a commit to llvm/llvm-project that referenced this pull request Feb 16, 2026
This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on

https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.

cc @dvyukov 

## Examples

I used the delay scheduler to find novel bugs that rarely or never
occurred with the unmodified TSAN runtime. Some of the bugs below were
found with earlier versions of the delay scheduler that I iterated on,
but with this most recent implementation in this PR, I can still find
the bugs far more reliably than with the standard TSAN runtime.

- A use-after-free in the
[BlazingMQ](https://github.com/bloomberg/blazingmq) broker during
ungraceful producer disconnect.
 - Race in stdexec: NVIDIA/stdexec#1395
- Race in stdexec's MPSC queue:
NVIDIA/stdexec#1812
- A few races in [BDE](https://github.com/bloomberg/bde) thread enabled
data structures/algorithms.
- The "Data race on variable a" test from
https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with
more aggressive adaptive scheduler options

# Outstanding work

- The
[RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969)
suggests moving the scheduler to sanitizer_common, so that ASAN can
leverage this. This should be done (should it be done in this PR?).
 - Missing interceptors for libdispatch
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 16, 2026
This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on

https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.

cc @dvyukov

## Examples

I used the delay scheduler to find novel bugs that rarely or never
occurred with the unmodified TSAN runtime. Some of the bugs below were
found with earlier versions of the delay scheduler that I iterated on,
but with this most recent implementation in this PR, I can still find
the bugs far more reliably than with the standard TSAN runtime.

- A use-after-free in the
[BlazingMQ](https://github.com/bloomberg/blazingmq) broker during
ungraceful producer disconnect.
 - Race in stdexec: NVIDIA/stdexec#1395
- Race in stdexec's MPSC queue:
NVIDIA/stdexec#1812
- A few races in [BDE](https://github.com/bloomberg/bde) thread enabled
data structures/algorithms.
- The "Data race on variable a" test from
https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with
more aggressive adaptive scheduler options

# Outstanding work

- The
[RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969)
suggests moving the scheduler to sanitizer_common, so that ASAN can
leverage this. This should be done (should it be done in this PR?).
 - Missing interceptors for libdispatch
manasij7479 pushed a commit to manasij7479/llvm-project that referenced this pull request Feb 18, 2026
This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on

https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.

cc @dvyukov 

## Examples

I used the delay scheduler to find novel bugs that rarely or never
occurred with the unmodified TSAN runtime. Some of the bugs below were
found with earlier versions of the delay scheduler that I iterated on,
but with this most recent implementation in this PR, I can still find
the bugs far more reliably than with the standard TSAN runtime.

- A use-after-free in the
[BlazingMQ](https://github.com/bloomberg/blazingmq) broker during
ungraceful producer disconnect.
 - Race in stdexec: NVIDIA/stdexec#1395
- Race in stdexec's MPSC queue:
NVIDIA/stdexec#1812
- A few races in [BDE](https://github.com/bloomberg/bde) thread enabled
data structures/algorithms.
- The "Data race on variable a" test from
https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with
more aggressive adaptive scheduler options

# Outstanding work

- The
[RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969)
suggests moving the scheduler to sanitizer_common, so that ASAN can
leverage this. This should be done (should it be done in this PR?).
 - Missing interceptors for libdispatch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants