Panwang Pan†, Chenguo Lin†, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu
Diff4Splat is a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image. Our approach unifies the generative priors of video diffusion models with geometry and motion constraints learned from large-scale 4D datasets.
Here is our Project Page.
Feel free to contact us or open an issue if you have any questions or suggestions.
You may also be interested in our other works:
- [CVPR 2026] MoVieS: a feed-forward model for 4D dynamic reconstruction from monocular videos.
- 2026-02-21: The paper is accepted to CVPR 2026.
- 2025-11-01: Diff4Splat is released on arXiv.
- 2025-10-15: Initial codebase structure established.
- 2025-10-01: Project development started.
This section clarifies the differences between the paper description and the current released codebase, to help set expectations for reproducibility.
- Video Backbone: CogVideoX-style Video DiT with 32-channel 3D Causal VAE (4×8×8 compression)
- LDRM Input: Video latent tensor (z) from the diffusion model, together with camera information, processed by LDRM Transformer to output deformable 3D Gaussians
We are actively working on:
- Paper-faithful implementation: A version closer to the CogVideoX + latent-input LDRM stack described in the paper
- Complete training/inference scripts: Exact scripts to reproduce the paper's results
- Pretrained checkpoints: Both the paper's setup and this repository's engineering variant
- Inference code released
- Training code and data preprocessing scripts released
- Pretrained checkpoints (coming soon)
- HuggingFace demo (coming soon)
- Preprocessed dataset (coming soon)
- Python >= 3.10
- PyTorch >= 2.0 (with CUDA support)
- CUDA >= 11.8
# Clone the repository
git clone https://github.com/paulpanwang/Diff4Splat.git
cd Diff4Splat
# Install required packages
pip install -r settings/requirements.txtConfigure your dataset root path in src/options.py or via the DATASET_ROOT environment variable.
The following datasets are supported:
- RealEstate10K (
re10k) - Static scenes - TartanAir (
tartanair) - Static scenes - MatrixCity (
matrixcity) - Static scenes - DL3DV (
dl3dv) - Static scenes - DynamicReplica (
dynamicreplica) - Dynamic scenes - PointOdyssey (
pointodyssey) - Dynamic scenes - VKITTI2 (
vkitti2) - Dynamic scenes - Spring (
spring) - Dynamic scenes - Stereo4D (
stereo4d) - Dynamic scenes
Dataset paths can be configured in src/options.py:
If you have questions about reproducibility or comparisons, please open an issue or contact the authors. We appreciate your understanding as we continue to improve and complete this codebase!
If you find our work helpful, please consider citing:
@article{pan2025diff4splat,
title={Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models},
author={Pan, Panwang and Lin, Chenguo and Zhao, Jingjing and Li, Chenxin and Lin, Yuchen and Li, Haopeng and Yan, Honglei and Wen, Kairun and Lin, Yunlong and Yuan, Yixuan and Mu, Yadong},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the authors of MoVieS, PartCrafter, DiffSplat, and other related works for their inspiring research and open-source contributions that helped shape this project.
