We provide a comprehensive toolkit for the DigiForests point cloud dataset, including a dataloader for machine learning research and preprocessing tools for data manipulation.
# Clone the repository
git clone https://github.com/PRBonn/digiforests.git
# Install the package
pip install .from digiforests_dataloader import DigiForestsDataModule
# Initialize datamodule. Note that the data_dir should contain a raw/data_split.json
datamodule = DigiForestsDataModule(
data_dir="/path/to/dataset",
split="train"
)
# Use in PyTorch Lightning Trainer
trainer.fit(model, datamodule=datamodule)See data/dataloader.py for additional details.
Important Note: To use our dataloader, please ensure that the dataset corresponds to the Dataset Configuration given below. Especially, the raw folder needs to contain a data_split.json. You can either use the one provided with this repository in <digiforests_repository>/data/data_split.json or you can generate one following the instructions in the Split Configuration section.
The dataset requires a specific folder structure:
dataset/
├── raw/
│ ├── data_split.json
│ └── point_clouds/
└── processed/
To create a custom split of the dataset, use the provided script:
python scripts/data/split_dataset.py /path/to/dataset/raw [--output-fp /path/to/output.json]This script splits the DigiForests dataset into train, validation, test, and prediction sets.
To modify the dataset splits, edit the split function in scripts/data/split_dataset.py:
def split(...):
train_exp_folders = [
"2023-03/exp06-m3",
"2023-03/exp07-m1",
# Add more folders for training
]
val_exp_folders = [
"2023-03/exp11-c1",
"2023-10/exp11-c1",
# Add more folders for validation
]
# Similarly, modify test_exp_folders and pred_exp_foldersAfter modifying the split configuration, run the script again to generate the updated data_split.json.
To aggregate individual point clouds and labels:
python scripts/data/aggregate_clouds_and_labels.py /path/to/plot/folder /path/to/output/folder [--denoise] [--voxel-down-sample-size FLOAT]from digiforests_dataloader import MinkowskiDigiForestsDataModule
datamodule = MinkowskiDigiForestsDataModule(
data_dir="/path/to/dataset"
)- Use
num_workersfor parallel data loading - Consider using GPU for augmentations (see
batch_transform)