Skip to content

Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles" (Slow Fast Sampling).

Notifications You must be signed in to change notification settings

LiangrunFlora/Slow-Fast-Sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles" (Slow Fast Sampling).

The Three Golden Principles: Certainty ยท Convergence ยท Positional

Pipeline
Fig. 1 โ€“ Throughput and Accuracy Comparison on GPQA (8-shot, Length=1024) with LLaDA and Our Proposed Methods.


โœจ Key Highlights

๐Ÿ’—๐Ÿ’—๐Ÿ’— What makes Slow Fast Sampling special?
Three Golden Principles ๐Ÿ‘‘ Certainty, Convergence, Positional guide exactly when and where to decode.
Two-Stage Dance ๐Ÿขโ†’โšก Cautious Slow phase finds a stable span, then the Fast phase parallel-decodes it in one swoop.
Plug-and-Play ๐Ÿ”Œ Drop-in sampler for any masked-diffusion LLM: LLaDA-8B, Dream-7B.
Crazy Speed-ups โšก 15.6 ร— faster than vanilla diffusion; 34.2 ร— with dLLM-Cache โ€”with minimal accuracy loss.
Outruns ARMs ๐Ÿƒ Beats LLaMA-3 8B in throughput while matching accuracy (Table 4, p. 9).

๐Ÿš€ Pipeline at a Glance

SFS-overview
Fig. 2 โ€“ Overview of the Slow Fast Sampling Pipeline: From Exploratory to Accelerated Decoding.


๐Ÿ› ๏ธ Installation

# 1. Clone
git clone https://github.com/LiangrunFlora/Slow-Fast-Sampling.git
cd slow-fast-sampling

# 2. Env (Python โ‰ฅ 3.10) & Deps
bash install.sh         

๐Ÿ“˜ Quick Start

# GSM8K with LLaDA-8B
bash scripts/run_llada_gsm8k_base.sh

# GPQA with LLaDA-8B
bash scripts/run_llada_gpqa_base.sh

# BBH with Dream-7B
bash scripts/run_dream_bbh_base.sh

๐Ÿ“ฎ Contact

Created and maintained by Qingyan Wei (liangrun@csu.edu.cn). Feel free to open an issue or drop me an emailโ€”PRs are welcome!

๐ŸŽ‰ Acknowledgements

This project stands on the shoulders of LLaDA, Dream, dLLM-Cache and the lm-evaluation-harness. Huge thanks to these amazing communities for paving the way.

๐Ÿ“Œ Citation

If you find this work useful, please cite our paper:

@article{wei2025accelerating,
  title={Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles},
  author={Wei, Qingyan and Zhang, Yaojie and Liu, Zhiyuan and Liu, Dongrui and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2506.10848},
  year={2025}
}

About

Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles" (Slow Fast Sampling).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published