LingBot-World: Advancing Open-source World Models

Robbyant Team

We are excited to introduce LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features.

High-Fidelity & Diverse Environments: It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond.
Long-Term Memory & Consistency: It enables a minute-level horizon while preserving contextual consistency over time, which is also known as long-term memory.
Real-Time Interactivity & Open Access: It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.

🎬 Video Demo

demo.mp4

🔥 News

Jan 29, 2026: 🎉 We release the technical report, code, and models for LingBot-World.

⚙️ Quick Start

This codebase is built upon Wan2.2. Please refer to their documentation for installation instructions.

Installation

Clone the repo:

git clone https://github.com/robbyant/lingbot-world.git
cd lingbot-world

Install dependencies:

# Ensure torch >= 2.4.0
pip install -r requirements.txt

Install flash_attn:

pip install flash-attn --no-build-isolation

Model Download

Model	Control Signals	Resolution	Download Links
LingBot-World-Base (Cam)	Camera Poses	480P & 720P	🤗 HuggingFace 🤖 ModelScope
LingBot-World-Base (Act)	Actions	-	To be released
LingBot-World-Fast	-	-	To be released

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download robbyant/lingbot-world-base-cam --local-dir ./lingbot-world-base-cam

Download models using modelscope-cli:

pip install modelscope
modelscope download robbyant/lingbot-world-base-cam --local_dir ./lingbot-world-base-cam

Inference

Before running inference, you need to prepare:

Input image
Text prompt
Control signals (optional, can be generated from a video using ViPE)
- intrinsics.npy: Shape [num_frames, 4], where the 4 values represent [fx, fy, cx, cy]
- poses.npy: Shape [num_frames, 4, 4], where each [4, 4] represents a transformation matrix in OpenCV coordinates
480P:

torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples/00/image.jpg --action_path examples/00 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt "The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls."

720P:

torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 720*1280 --ckpt_dir lingbot-world-base-cam --image examples/00/image.jpg --action_path examples/00 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt "The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls."

Alternatively, you can run inference without control actions:

torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples/00/image.jpg --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt "The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls."

Tips: If you have sufficient CUDA memory, you may increase the frame_num parameter to a value such as 961 to generate a one-minute video at 16 FPS. Otherwise if the CUDA memory is not sufficient, you may use --t5_cpu to decrease the memory usage.

Demo Results

We provide comparison demos where camera parameters are estimated by ViPE from original videos downloaded from Genie3:

fly.mp4

ship.mp4

📚 Related Projects

📜 License

This project is licensed under the Apache 2.0 License. Please refer to the LICENSE file for the full text, including details on rights and restrictions.

✨ Acknowledgement

We would like to express our gratitude to the Wan Team for open-sourcing their code and models. Their contributions have been instrumental to the development of this project.

📖 Citation

If you find this work useful for your research, please cite our paper:

@article{lingbot-world,
      title={Advancing Open-source World Models}, 
      author={Robbyant Team and Zelin Gao and Qiuyu Wang and Yanhong Zeng and Jiapeng Zhu and Ka Leong Cheng and Yixuan Li and Hanlin Wang and Yinghao Xu and Shuailei Ma and Yihang Chen and Jie Liu and Yansong Cheng and Yao Yao and Jiayi Zhu and Yihao Meng and Kecheng Zheng and Qingyan Bai and Jingye Chen and Zehong Shen and Yue Yu and Xing Zhu and Yujun Shen and Hao Ouyang},
      journal={arXiv preprint arXiv:2601.20540},
      year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
examples		examples
wan		wan
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
LingBot_World_paper.pdf		LingBot_World_paper.pdf
README copy.md		README copy.md
README.md		README.md
generate.py		generate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_win.txt		requirements_win.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LingBot-World: Advancing Open-source World Models

🎬 Video Demo

🔥 News

⚙️ Quick Start

Installation

Model Download

Inference

Demo Results

📚 Related Projects

📜 License

✨ Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

3a1b2c3/lingbot-world

Folders and files

Latest commit

History

Repository files navigation

LingBot-World: Advancing Open-source World Models

🎬 Video Demo

🔥 News

⚙️ Quick Start

Installation

Model Download

Inference

Demo Results

📚 Related Projects

📜 License

✨ Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages