Even faster variant of LightningDiT (CVPR 2025 Oral) by combining it with TREAD (ICCV 2025)
28% better FID (without CFG)
conda create -n lightningdit python=3.10.12
conda activate lightningdit
pip install -r requirements.txt
- To enable the TREAD variant, set
model.use_tread: truein the chosen reproduction config (and optionally adjust themodel.treadblock). Leaving itfalseruns the dense LightningDiT baseline.
-
Download weights and data infos:
-
Download pre-trained models
Tokenizer Generation Model FID FID cfg VA-VAE LightningDiT-XL-800ep 2.17 1.35 LightningDiT-XL-64ep 5.14 2.11 -
Download latent statistics. This file contains the channel-wise mean and standard deviation statistics.
-
Modify config file in
configs/reproductionsas required.
-
-
Fast sample demo images:
Run:
bash bash run_fast_inference.sh ${config_path}Images will be saved into
demo_images/demo_samples.png, e.g. the following one: -
Sample for FID-50k evaluation:
Run:
bash run_inference.sh ${config_path}NOTE: The FID result reported by the script serves as a reference value. The final FID-50k reported in paper is evaluated with ADM:
git clone https://github.com/openai/guided-diffusion.git # save your npz file with tools/save_npz.py bash run_fid_eval.sh /path/to/your.npz
- We provide a 👆detailed tutorial for training your own models of 2.1 FID score within only 64 epochs. It takes only about 10 hours with 8 x H800 GPUs.
This repo is a modification of LightningDiT, which is mainly built on DiT, FastDiT and SiT. The VAVAE codes are mainly built with LDM and MAR. Thanks for all these great works.
If you find this work useful, please cite the related papers:
# ICCV 2025
@article{krause2025tread,
title={TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training},
author={Krause, Felix and Phan, Timy and Gui, Ming and Baumann, Stefan Andreas and Hu, Vincent Tao and Ommer, Bj{\"o}rn},
journal={arXiv preprint arXiv:2501.04765},
year={2025}
}
# CVPR 2025
@inproceedings{yao2025vavae,
title={Reconstruction vs. generation: Taming optimization dilemma in latent diffusion models},
author={Yao, Jingfeng and Yang, Bin and Wang, Xinggang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
# NeurIPS 2024
@article{yao2024fasterdit,
title={Fasterdit: Towards faster diffusion transformers training without architecture modification},
author={Yao, Jingfeng and Wang, Cheng and Liu, Wenyu and Wang, Xinggang},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={56166--56189},
year={2024}
}

