Tianhao Wu · Chuanxia Zheng · Frank Guan . Andrea Vedaldi . Tat-Jen Cham
Paper | Project Page | Pretrain Weight | Demo
This code has been tested on Ubuntu 22.02 with torch 2.4.0 & CUDA 11.8. We sincerely thank TRELLIS for providing the environment setup and follow exactly as their instruction in this work.
Create a new conda environment named amodal3r and install the dependencies:
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrastThe detailed usage of setup.sh can be found by running . ./setup.sh --help.
Usage: setup.sh [OPTIONS]
Options:
-h, --help Display this help message
--new-env Create a new conda environment
--basic Install basic dependencies
--train Install training dependencies
--xformers Install xformers
--flash-attn Install flash-attn
--diffoctreerast Install diffoctreerast
--vox2seq Install vox2seq
--spconv Install spconv
--mipgaussian Install mip-splatting
--kaolin Install kaolin
--nvdiffrast Install nvdiffrast
--demo Install all dependencies for demoWe have provided our pretrained weights of both sparse structure module and SLAT module on HuggingFace.
We use three datasets for training: ABO, 3D-FUTURE, and HSSD. To obtain the training data, please also refer to TRELLIS. Thanks to them for the amazing work!!!.
When the data is ready, combine them and put under ./dataset/abo_3dfuture_hssd. If you want to train on a single dataset, feel free to modify the dataloader. For training, rendering images, Sparse Structure and SLAT are required.
To train you own model, you can start either on our weights or TRELLIS original weights. Please download the weights and put them under ./ckpts.
To train the sparse structure module with our designed mask-weighted cross-attention and occlusion-aware attention, please run:
. ./train_ss.shTo train the sparse structure module with our designed mask-weighted cross-attention and occlusion-aware attention, please run:
. ./train_slat.shThe output folder where the model will be saved can be changed by modifying --vis parameter in the script.
We have prepared examples under ./example folder. It supports both single and multiple image as input. For inference, please run:
python ./inference.pyIf you want to try on you own data. You should prepare: 1) original image and 2) mask image (background is white (255,255,255), visible area is gray (188,188,188), occluded area is black (0,0,0)).
You can use Segment Anything to obtain the corresponding mask, which is used for our in-the-wild examples in the paper and also in our demo.
We render Toys4K and GSO exactly the same as training data. To obtain the evaluation dataset, please modify the directory in 3d_mask_render.py and run:
python ./3d_mask_render.pyIt will create a renders_mask folder with the 3D consistent mask in it.
