Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles" (Slow Fast Sampling).
The Three Golden Principles: Certainty ยท Convergence ยท Positional

Fig. 1 โ Throughput and Accuracy Comparison on GPQA (8-shot, Length=1024) with LLaDA and Our Proposed Methods.
| ๐๐๐ | What makes Slow Fast Sampling special? |
|---|---|
| Three Golden Principles ๐ | Certainty, Convergence, Positional guide exactly when and where to decode. |
| Two-Stage Dance ๐ขโโก | Cautious Slow phase finds a stable span, then the Fast phase parallel-decodes it in one swoop. |
| Plug-and-Play ๐ | Drop-in sampler for any masked-diffusion LLM: LLaDA-8B, Dream-7B. |
| Crazy Speed-ups โก | 15.6 ร faster than vanilla diffusion; 34.2 ร with dLLM-Cache โwith minimal accuracy loss. |
| Outruns ARMs ๐ | Beats LLaMA-3 8B in throughput while matching accuracy (Table 4, p. 9). |
# 1. Clone
git clone https://github.com/LiangrunFlora/Slow-Fast-Sampling.git
cd slow-fast-sampling
# 2. Env (Python โฅ 3.10) & Deps
bash install.sh # GSM8K with LLaDA-8B
bash scripts/run_llada_gsm8k_base.sh
# GPQA with LLaDA-8B
bash scripts/run_llada_gpqa_base.sh
# BBH with Dream-7B
bash scripts/run_dream_bbh_base.shCreated and maintained by Qingyan Wei (liangrun@csu.edu.cn). Feel free to open an issue or drop me an emailโPRs are welcome!
This project stands on the shoulders of LLaDA, Dream, dLLM-Cache and the lm-evaluation-harness. Huge thanks to these amazing communities for paving the way.
If you find this work useful, please cite our paper:
@article{wei2025accelerating,
title={Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles},
author={Wei, Qingyan and Zhang, Yaojie and Liu, Zhiyuan and Liu, Dongrui and Zhang, Linfeng},
journal={arXiv preprint arXiv:2506.10848},
year={2025}
}