Name	Name	Last commit message	Last commit date
Latest commit History 15 Commits
cloud	cloud
configs	configs
doc	doc
magicdrive	magicdrive
scripts	scripts
third_party	third_party
tools	tools
workshop	workshop
LICENSE	LICENSE
README.MD	README.MD
requirements.txt	requirements.txt

MagicDrive-t

MagicDrive video generation. We release this version mainly for reference. Please be prepared to solve any issue. Before getting start, it is necessary for users to setup and understand the code in main branch.

MagicDrive-t Checkpoints

Model checkpoint for rawbox_mv2.0t_0.4.3.yaml on OneDrive.
Model checkpoint for rawbox_mv2.0t_0.4.3_60.yaml (i.e., 60 frame model) on huggingface

Environment Setup

The environment should be compatible with MagicDrive (single frame). However, this codebase rely on another version of bevfusion (in third_party) and some video related python packages.

The code is tested with Pytorch==1.10.2 and torchvision==0.11.3. You should have these packages before starting. To install additional packages, follow:

cd ${ROOT}
pip install -r requirements.txt

We opt to install the source code for the following packages, with cd ${FOLDER}; pip install -e .

# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca3916)
└── xformers -> (optional) we minorly change 0.0.19 to install with pytorch1.10.2

If you need our xformers, please find it here. Please read FAQ if you encounter any issues.

Pretrained Weights

Our training are based on stable-diffusion-v1-5

We assume you put them at ${ROOT}/../pretrained/ as follows:

{ROOT}/../pretrained/stable-diffusion-v1-5/
├── README.md
├── feature_extractor
├── model_index.json
├── safety_checker
├── scheduler
├── text_encoder
├── tokenizer
├── unet
├── v1-5-pruned-emaonly.ckpt
├── v1-5-pruned.ckpt
├── v1-inference.yaml
└── vae

Pretrained weight of MagicDrive (image generation)

{ROOT}/../MagicDrive-pretrained/
└── SDv1.5mv-rawbox_2023-09-07_18-39_224x400

Our models will load this image generation model before training for video generation.

Datasets

Please prepare the nuScenes dataset as bevfusion's instructions. Note:

Run with our forked version of mmdet3d.
It is better to run generation ONE-BY-ONE to avoid overwrite.
You have to move nuscenes_dbinfos_train.pkl and nuscenes_gt_database manual from nuscenes root to ann_file folder like nuscenes_mmdet3d.

After preparation, you should have

${ROOT}/../data/
├── nuscenes
│   ├── ...
│   └── sweeps
└── nuscenes_mmdet3d

Note

In our latest version/model, we only adopt Option2. It is safe to skip Option1.

We have released all the pre-generated annotations, please find them at W-CODA2024 Track2.

(Option1) Generation ann_file for video frames (with keyframes / sweeps). We use them to train 7~16-frame video model.

# create `nuscenes_mmdet3d-t-keyframes`
python tools/create_data.py nuscenes \
	--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-keyframes/ \
	--extra-tag nuscenes --only_info

# create `nuscenes_mmdet3d-t-use-break`
USE_BREAK=True \
python tools/create_data.py nuscenes \
	--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-use-break/ \
	--extra-tag nuscenes --only_info --with_cam_sweeps

The data structure should looks like:

${ROOT}/../data/
├── ...
├── nuscenes_mmdet3d-t-use-break
│   ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
│   ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database/
│   ├── nuscenes_infos_train_t6.pkl
│   └── nuscenes_infos_val_t6.pkl
└── nuscenes_mmdet3d-t-keyframes
    ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
    ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database
    ├── nuscenes_infos_train.pkl
    └── nuscenes_infos_val.pkl

(Option2) Generation annotations for sweep frames and ann_file for MagicDrive. We will use them to train 16-frame video models, and video generation for all 13~16 frame models.

Please follow ASAP to generate interp annotations for nuScenes. Simply, the following command should do the work:
```
# in ASAP root.
bash scripts/ann_generator.sh 12 --ann_strategy 'interp' 	
```
(Optional) Generate advanced annotations for sweeps. (We do not observe major difference between interp and advanced. This step can be skipped.)
Use commands in scripts/prepare_dataset.sh to generate ann_file and cache.

You should have

${ROOT}/../data/
├── ...
├── nuscenes
│	  ├── advanced_12Hz_trainval
│	  ├── interp_12Hz_trainval
│	  ├── nuscenes_advanced_12Hz_gt_database
│	  └── nuscenes_interp_12Hz_gt_database
└── nuscenes_mmdet3d-12Hz
	  ├── nuscenes_advanced_12Hz_dbinfos_train.pkl
	  ├── nuscenes_advanced_12Hz_infos_train.pkl
	  ├── nuscenes_advanced_12Hz_infos_val.pkl
	  ├── nuscenes_interp_12Hz_dbinfos_train.pkl
	  ├── nuscenes_interp_12Hz_infos_train.pkl
	  └── nuscenes_interp_12Hz_infos_val.pkl

(Optional but recommended) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py with config in configs/exp/map_cache_gen.yaml. You have to rename the cache files correctly after generating them.

${ROOT}/../data/
├── ...
├── nuscenes_map_aux  # single frame cache, keyframes also use this.
│   ├── train_26x200x200_map_aux_full.h5
│   ├── train_26x400x400_map_aux_full.h5
│   ├── val_26x200x200_map_aux_full.h5
│   └── val_26x400x400_map_aux_full.h5
├── nuscenes_map_aux_12Hz_adv  # from advanced
│		├── train_26x200x200_12Hz_advanced.h5
│ 	└── val_26x200x200_12Hz_advanced.h5
├── nuscenes_map_aux_12Hz_int  # from interp
│		├── train_26x200x200_12Hz_interp.h5
│		└── val_26x200x200_12Hz_interp.h5
└── nuscenes_map_cache_t-use-break  # with sweep, use break
		├── train_8x200x200_map_use-break.h5
		└── val_8x200x200_map_use-break.h5

Train MagicDrive-t

Run training for 224x400 with 7 frames.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.3

Run training for 224x400 with 16 frames.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.4

Run training for 224x400 with 16 frames with sweeps and generated annotations.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3
# or
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.4

Run training for 224x400 with 61 frames with sweeps and generated annotations. (8xA800)

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3_60

Typically, train ~80000 steps (or 4 epochs with 12Hz data) would be enough.

Video Generation

Our default log directory is ${ROOT}/magicdrive-t-log. Please be prepared.

Run video generation with 12Hz annotations.

python tools/test.py resume_from_checkpoint=${RUN_LOG_DIR} task_id=${ANY} \
	runner.validation_times=4 runner.pipeline_param.init_noise=rand_all \
	++dataset.data.val.ann_file=${ROOT}/../data/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val.pkl

Cite Us

@inproceedings{gao2023magicdrive,
  title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
  author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
  booktitle = {International Conference on Learning Representations},
  year={2024}
}

Credit

We adopt following open-sourced projects:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MagicDrive-t

MagicDrive-t Checkpoints

Environment Setup

Pretrained Weights

Datasets

Train MagicDrive-t

Video Generation

Cite Us

Credit

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MagicDrive-t

MagicDrive-t Checkpoints

Environment Setup

Pretrained Weights

Datasets

Train MagicDrive-t

Video Generation

Cite Us

Credit

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages