| Category | Link | Badge |
|---|---|---|
| Demo | Live Demo | |
| Paper | arXiv Paper |
Autonomous driving relies on robust models trained on high-quality, large-scale multi-view driving videos for tasks such as perception, tracking, and planning. While world models offer a cost-effective solution for generating realistic driving videos, they struggle to maintain instance-level temporal consistency and spatial geometric fidelity. To address these challenges, we propose InstaDrive, a novel framework with two key advancements:
- Instance Flow Guider module — extracts and propagates instance features across frames to ensure temporal consistency and preserve instance identity.
- Spatial Geometric Aligner module — improves spatial reasoning, ensures precise instance positioning, and explicitly models occlusion hierarchies.
By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality and improves downstream autonomous driving tasks on the nuScenes dataset. We also leverage CARLA’s autopilot to procedurally and stochastically simulate rare but safety-critical driving scenarios across different maps and regions.
- Example A — Oncoming bus, parked car on center divider, pedestrian, roundabout, parked trucks.
ex2.mp4
Description: Drivable areas, sidewalks, and zebra crossings are faithfully generated according to the road map projections. Objects in the scene are accurately placed and sized.
- Example B — Waiting at intersection, pedestrians on sidewalk, turning right, cones, crossing intersection.
ex3.mp4
Description: Small and densely packed objects are rendered accurately at their correct positions following 3D bounding box coordinates. Objects maintain temporal consistency via instance flow guidance.
- MagicDrive-V2 (baseline)
pred.mp4
- InstaDrive
pred.mp4
Explanation: In MagicDrive-V2, the front orientation of the white car (FrontLeft and BackLeft views) changes over time. InstaDrive preserves instance attributes, demonstrating superior temporal consistency.
- Panacea (baseline occlusion)
occ.mp4
- InstaDrive (occlusion)
pred.mp4
Explanation: A stationary box (FrontLeft view) is farther away, while a moving box is closer. In Panacea, the distant box incorrectly appears in front of the moving object. InstaDrive correctly renders the closer moving object in front.
- MagicDrive-V2 (spatial example)
pred.mp4
- InstaDrive (spatial example)
pred.mp4
Explanation: In some baselines like MagicDrive-V2, FrontRight view objects may deviate from the bounding box. InstaDrive preserves accurate spatial alignment.
- a. Instance-Level Temporal Consistency
ex5.mp4
Description: Demonstrates model consistency of instance attributes across complex scenarios.
- b. Occlusion Hierarchy
occlusion1.mp4
Description: Further confirms correct handling of occlusion relationships.
- c. Spatial Localization
ex3.mp4
Description: Demonstrates model accuracy in spatial localization across different scenarios.
-
Example A — The vehicle ahead brakes, prompting the ego vehicle to decelerate and stop.
Please refer to the Project Page for the long-term demo.
Description: Simulates sudden braking to visualize behavior and generation under emergency conditions. -
Example B — Vehicle cutting in from the right lane.
cut-in.mp4
Description: Simulates lane cutting to test model stability and generation in complex traffic scenarios.
Please refer to the Project Page for long-term video demos.
Description: Demonstrates long-term generation consistency and coherent world modeling.

