Skip to content

shanpoyang654/InstaDrive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

Project Links

Category Link Badge
Demo Live Demo Demo
Paper arXiv Paper Paper

Abstract

Autonomous driving relies on robust models trained on high-quality, large-scale multi-view driving videos for tasks such as perception, tracking, and planning. While world models offer a cost-effective solution for generating realistic driving videos, they struggle to maintain instance-level temporal consistency and spatial geometric fidelity. To address these challenges, we propose InstaDrive, a novel framework with two key advancements:

  1. Instance Flow Guider module — extracts and propagates instance features across frames to ensure temporal consistency and preserve instance identity.
  2. Spatial Geometric Aligner module — improves spatial reasoning, ensures precise instance positioning, and explicitly models occlusion hierarchies.

By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality and improves downstream autonomous driving tasks on the nuScenes dataset. We also leverage CARLA’s autopilot to procedurally and stochastically simulate rare but safety-critical driving scenarios across different maps and regions.

InstaDrive Overview
InstaDrive Method


1. Multimodal Condition Controllability

1.1 Layout Controllability

  • Example A — Oncoming bus, parked car on center divider, pedestrian, roundabout, parked trucks.
ex2.mp4

Description: Drivable areas, sidewalks, and zebra crossings are faithfully generated according to the road map projections. Objects in the scene are accurately placed and sized.

  • Example B — Waiting at intersection, pedestrians on sidewalk, turning right, cones, crossing intersection.
ex3.mp4

Description: Small and densely packed objects are rendered accurately at their correct positions following 3D bounding box coordinates. Objects maintain temporal consistency via instance flow guidance.


2. Qualitative Comparison

2.1 Comparison with Baseline

a. Instance-Level Temporal Consistency

  • MagicDrive-V2 (baseline)
pred.mp4
  • InstaDrive
pred.mp4

Explanation: In MagicDrive-V2, the front orientation of the white car (FrontLeft and BackLeft views) changes over time. InstaDrive preserves instance attributes, demonstrating superior temporal consistency.

b. Occlusion Hierarchy

  • Panacea (baseline occlusion)
occ.mp4
  • InstaDrive (occlusion)
pred.mp4

Explanation: A stationary box (FrontLeft view) is farther away, while a moving box is closer. In Panacea, the distant box incorrectly appears in front of the moving object. InstaDrive correctly renders the closer moving object in front.

c. Spatial Localization

  • MagicDrive-V2 (spatial example)
pred.mp4
  • InstaDrive (spatial example)
pred.mp4

Explanation: In some baselines like MagicDrive-V2, FrontRight view objects may deviate from the bounding box. InstaDrive preserves accurate spatial alignment.


2.2 Additional Results

  • a. Instance-Level Temporal Consistency
ex5.mp4

Description: Demonstrates model consistency of instance attributes across complex scenarios.

  • b. Occlusion Hierarchy
occlusion1.mp4

Description: Further confirms correct handling of occlusion relationships.

  • c. Spatial Localization
ex3.mp4

Description: Demonstrates model accuracy in spatial localization across different scenarios.


3. Scenario Simulation Using CARLA-Generated Layouts

3.1 Corner Cases in Autonomous Driving

  • Example A — The vehicle ahead brakes, prompting the ego vehicle to decelerate and stop.
    Please refer to the Project Page for the long-term demo.
    Description: Simulates sudden braking to visualize behavior and generation under emergency conditions.

  • Example B — Vehicle cutting in from the right lane.

cut-in.mp4

Description: Simulates lane cutting to test model stability and generation in complex traffic scenarios.

3.2 Long-term Generation (2x speed)

Please refer to the Project Page for long-term video demos.
Description: Demonstrates long-term generation consistency and coherent world modeling.


About

【ICCV 2025】 InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors