This toolkit provides a comprehensive pipeline for preparing 3D datasets, including downloading, processing, voxelizing, and latent encoding for SC-VAE and Flow Model training.
This toolkit is built upon and extended from the data processing scripts of TRELLIS.2. We gratefully acknowledge the TRELLIS.2 team for open-sourcing their data preparation pipeline, which served as the foundation for this work. Our extensions include view-aligned voxelization and latent encoding for Pixal3D.
Initialize the environment and install necessary dependencies:
. ./data_toolkit/setup.shBefore processing, load the dataset metadata.
python data_toolkit/build_metadata.py <SUBSET> --root <ROOT> [--source <SOURCE>]Arguments:
SUBSET: Target dataset subset. Options:ObjaverseXL,ABO,HSSD,TexVerse(Training sets);SketchfabPicked,Toys4k(Test sets).ROOT: Root directory to save the data.SOURCE: Data source (Required ifSUBSETisObjaverseXL). Options:sketchfab,github.
Example:
Load metadata for ObjaverseXL (sketchfab) and save to datasets/ObjaverseXL_sketchfab:
python data_toolkit/build_metadata.py ObjaverseXL --source sketchfab --root datasets/ObjaverseXL_sketchfabDownload the 3D assets to the local storage.
python data_toolkit/download.py <SUBSET> --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>]Arguments:
RANK/WORLD_SIZE: Parameters for multi-node distributed downloading.
Example:
To download the ObjaverseXL subset:
Note: The example below sets a large
WORLD_SIZE(160,000) for demonstration purposes, meaning only a tiny fraction of the dataset will be downloaded by this single process.
python data_toolkit/download.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab --world_size 160000Attention: Some datasets may require an interactive Hugging Face login or manual steps. Please follow any on-screen instructions.
Update Metadata: After downloading, update the metadata registry:
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabIf download records are missing but files already exist locally, use --from_file to scan and rebuild:
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab --from_fileStandardize 3D assets by dumping mesh and PBR textures. Note: This process utilizes the CPU.
# Dump Meshes
python data_toolkit/dump_mesh.py <SUBSET> --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>]
# Dump PBR Textures
python data_toolkit/dump_pbr.py <SUBSET> --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>]
# Get statistics of the asset
python data_toolkit/asset_stats.py --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>]Example:
python data_toolkit/dump_mesh.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab
python data_toolkit/dump_pbr.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab
python data_toolkit/asset_stats.py --root datasets/ObjaverseXL_sketchfabUpdate Metadata:
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabRender multi-view images for each asset. These are used both as image conditions for the generator and as camera transforms for view-aligned processing in subsequent steps. Note: Blender and Pillow will be automatically installed on first run.
python data_toolkit/render_cond.py <SUBSET> --root <ROOT> [--num_views <NUM_VIEWS>] [--rank <RANK> --world_size <WORLD_SIZE>]Arguments:
NUM_VIEWS: Number of views to render per asset. Default is2.
Example:
python data_toolkit/render_cond.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabUpdate Metadata:
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabConvert the processed meshes and textures into view-aligned O-Voxels format. Each asset is transformed according to camera views from Step 5, producing per-view voxel representations. Note: This process utilizes the CPU.
python data_toolkit/dual_grid_view.py <SUBSET> --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>] [--resolution <RESOLUTION>] [--view_indices <VIEW_INDICES>]
python data_toolkit/voxelize_pbr_view.py <SUBSET> --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>] [--resolution <RESOLUTION>] [--view_indices <VIEW_INDICES>]Arguments:
RESOLUTION: Target resolutions for O-Voxels, comma-separated (e.g.,256,512,1024). Default is256.VIEW_INDICES: Specific view indices to process (e.g.,0,1,2or0-5). Default processes all available views.
Example:
Convert ObjaverseXL to resolution 256 for views 0-1:
python data_toolkit/dual_grid_view.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab --resolution 256 --view_indices 0-1
python data_toolkit/voxelize_pbr_view.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab --resolution 256 --view_indices 0-1Update Metadata:
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabEncode view-aligned sparse structures into latents to train the first-stage generator. Each step produces per-view latent files.
# 1. Encode Shape Latents (multi-view)
python data_toolkit/encode_shape_latent_view.py --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>] [--resolution <RESOLUTION>] [--view_indices <VIEW_INDICES>]
# 2. Encode PBR Latents (view-aligned)
python data_toolkit/encode_pbr_latent_view.py --root <ROOT> [--rank <RANK> --world_size <WORLD_SIZE>] [--resolution <RESOLUTION>] [--view_indices <VIEW_INDICES>]
# 3. Update Metadata (Required before next step)
python data_toolkit/build_metadata.py <SUBSET> --root <ROOT>
# 4. Encode Sparse Structure (SS) Latents (multi-view)
python data_toolkit/encode_ss_latent_view.py --root <ROOT> --shape_latent_name <SHAPE_LATENT_NAME> [--rank <RANK> --world_size <WORLD_SIZE>] [--resolution <SS_RESOLUTION>] [--view_indices <VIEW_INDICES>]Arguments:
RESOLUTION: Input O-Voxel resolution. Default is1024.SS_RESOLUTION: Resolution for sparse structures. Default is64.SHAPE_LATENT_NAME: The specific version name of the shape latent (use the_viewvariant name).VIEW_INDICES: Specific view indices to process (e.g.,0,1,2or0-5).
Example:
python data_toolkit/encode_shape_latent_view.py --root datasets/ObjaverseXL_sketchfab --resolution 512 --view_indices 0-1
python data_toolkit/encode_pbr_latent_view.py --root datasets/ObjaverseXL_sketchfab --resolution 512 --view_indices 0-1
python data_toolkit/encode_shape_latent_view.py --root datasets/ObjaverseXL_sketchfab --resolution 1024 --view_indices 0-1
python data_toolkit/encode_pbr_latent_view.py --root datasets/ObjaverseXL_sketchfab --resolution 1024 --view_indices 0-1
# Update metadata
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfab
# Encode SS Latents (view-aligned)
python data_toolkit/encode_ss_latent_view.py --root datasets/ObjaverseXL_sketchfab --shape_latent_name shape_enc_next_dc_f16c32_fp16_1024_view --resolution 64 --view_indices 0-1
# Final Metadata Update
python data_toolkit/build_metadata.py ObjaverseXL --root datasets/ObjaverseXL_sketchfabDecode latent files back to meshes, export GLB, and render a front-view image for visual inspection.
Shape Latent Visualization:
python data_toolkit/visualize_shape_latent.py \
--root datasets/ObjaverseXL_sketchfab \
--sha256 <SHA256_HASH> \
--resolution 1024 \
--view_idx 0PBR Latent Visualization (shape + texture):
python data_toolkit/visualize_pbr_latent.py \
--root datasets/ObjaverseXL_sketchfab \
--sha256 <SHA256_HASH> \
--resolution 1024 \
--view_idx 0Outputs are saved to <ROOT>/vis/<SHA256>/ (shape) or <ROOT>/vis_pbr/<SHA256>/ (PBR), including:
- Decoded GLB mesh (with PBR textures for PBR variant)
- Front-view rendered images (normal/depth for shape; shaded/base_color/normal etc. for PBR)
- Copied condition renders and camera transforms from Step 5