MagicDrive video generation. We release this version mainly for reference. Please be prepared to solve any issue. Before getting start, it is necessary for users to setup and understand the code in main branch.
- Model checkpoint for rawbox_mv2.0t_0.4.3.yaml on OneDrive.
- Model checkpoint for rawbox_mv2.0t_0.4.3_60.yaml (i.e., 60 frame model) on huggingface
The environment should be compatible with MagicDrive (single frame). However, this codebase rely on another version of bevfusion (in
third_party) and some video related python packages.
The code is tested with Pytorch==1.10.2 and torchvision==0.11.3.
You should have these packages before starting. To install additional packages, follow:
cd ${ROOT}
pip install -r requirements.txtWe opt to install the source code for the following packages, with cd ${FOLDER}; pip install -e .
# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca3916)
└── xformers -> (optional) we minorly change 0.0.19 to install with pytorch1.10.2If you need our xformers, please find it here. Please read FAQ if you encounter any issues.
Our training are based on stable-diffusion-v1-5
We assume you put them at ${ROOT}/../pretrained/ as follows:
{ROOT}/../pretrained/stable-diffusion-v1-5/
├── README.md
├── feature_extractor
├── model_index.json
├── safety_checker
├── scheduler
├── text_encoder
├── tokenizer
├── unet
├── v1-5-pruned-emaonly.ckpt
├── v1-5-pruned.ckpt
├── v1-inference.yaml
└── vaePretrained weight of MagicDrive (image generation)
{ROOT}/../MagicDrive-pretrained/
└── SDv1.5mv-rawbox_2023-09-07_18-39_224x400Our models will load this image generation model before training for video generation.
Please prepare the nuScenes dataset as bevfusion's instructions. Note:
- Run with our forked version of mmdet3d.
- It is better to run generation ONE-BY-ONE to avoid overwrite.
- You have to move
nuscenes_dbinfos_train.pklandnuscenes_gt_databasemanual from nuscenes root toann_filefolder likenuscenes_mmdet3d.
After preparation, you should have
${ROOT}/../data/
├── nuscenes
│ ├── ...
│ └── sweeps
└── nuscenes_mmdet3dNote
In our latest version/model, we only adopt Option2. It is safe to skip Option1.
We have released all the pre-generated annotations, please find them at W-CODA2024 Track2.
(Option1) Generation ann_file for video frames (with keyframes / sweeps). We use them to train 7~16-frame video model.
# create `nuscenes_mmdet3d-t-keyframes`
python tools/create_data.py nuscenes \
--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-keyframes/ \
--extra-tag nuscenes --only_info
# create `nuscenes_mmdet3d-t-use-break`
USE_BREAK=True \
python tools/create_data.py nuscenes \
--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-use-break/ \
--extra-tag nuscenes --only_info --with_cam_sweepsThe data structure should looks like:
${ROOT}/../data/
├── ...
├── nuscenes_mmdet3d-t-use-break
│ ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
│ ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database/
│ ├── nuscenes_infos_train_t6.pkl
│ └── nuscenes_infos_val_t6.pkl
└── nuscenes_mmdet3d-t-keyframes
├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database
├── nuscenes_infos_train.pkl
└── nuscenes_infos_val.pkl(Option2) Generation annotations for sweep frames and ann_file for MagicDrive. We will use them to train 16-frame video models, and video generation for all 13~16 frame models.
- Please follow ASAP to generate
interpannotations for nuScenes. Simply, the following command should do the work:# in ASAP root. bash scripts/ann_generator.sh 12 --ann_strategy 'interp'
- (Optional) Generate
advancedannotations for sweeps. (We do not observe major difference betweeninterpandadvanced. This step can be skipped.) - Use commands in
scripts/prepare_dataset.shto generateann_fileand cache.
You should have
${ROOT}/../data/
├── ...
├── nuscenes
│ ├── advanced_12Hz_trainval
│ ├── interp_12Hz_trainval
│ ├── nuscenes_advanced_12Hz_gt_database
│ └── nuscenes_interp_12Hz_gt_database
└── nuscenes_mmdet3d-12Hz
├── nuscenes_advanced_12Hz_dbinfos_train.pkl
├── nuscenes_advanced_12Hz_infos_train.pkl
├── nuscenes_advanced_12Hz_infos_val.pkl
├── nuscenes_interp_12Hz_dbinfos_train.pkl
├── nuscenes_interp_12Hz_infos_train.pkl
└── nuscenes_interp_12Hz_infos_val.pkl(Optional but recommended) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py with config in configs/exp/map_cache_gen.yaml. You have to rename the cache files correctly after generating them.
${ROOT}/../data/
├── ...
├── nuscenes_map_aux # single frame cache, keyframes also use this.
│ ├── train_26x200x200_map_aux_full.h5
│ ├── train_26x400x400_map_aux_full.h5
│ ├── val_26x200x200_map_aux_full.h5
│ └── val_26x400x400_map_aux_full.h5
├── nuscenes_map_aux_12Hz_adv # from advanced
│ ├── train_26x200x200_12Hz_advanced.h5
│ └── val_26x200x200_12Hz_advanced.h5
├── nuscenes_map_aux_12Hz_int # from interp
│ ├── train_26x200x200_12Hz_interp.h5
│ └── val_26x200x200_12Hz_interp.h5
└── nuscenes_map_cache_t-use-break # with sweep, use break
├── train_8x200x200_map_use-break.h5
└── val_8x200x200_map_use-break.h5Run training for 224x400 with 7 frames.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.3Run training for 224x400 with 16 frames.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.4Run training for 224x400 with 16 frames with sweeps and generated annotations.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3
# or
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.4Run training for 224x400 with 61 frames with sweeps and generated annotations. (8xA800)
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3_60Typically, train ~80000 steps (or 4 epochs with 12Hz data) would be enough.
Our default log directory is ${ROOT}/magicdrive-t-log. Please be prepared.
Run video generation with 12Hz annotations.
python tools/test.py resume_from_checkpoint=${RUN_LOG_DIR} task_id=${ANY} \
runner.validation_times=4 runner.pipeline_param.init_noise=rand_all \
++dataset.data.val.ann_file=${ROOT}/../data/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val.pkl@inproceedings{gao2023magicdrive,
title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
booktitle = {International Conference on Learning Representations},
year={2024}
}We adopt following open-sourced projects: