A ROS-based semantic 3D mapping system that fuses real-time semantic segmentation with volumetric TSDF mapping. The system uses a diffusion-based RGB-D semantic segmentation model to label point clouds with semantic class information, then integrates them into a 3D semantic voxel map using a modified Voxblox TSDF integrator with Bayesian label updates.
RealSense D455 (RGB + Depth)
│
▼
[semantic_cloud]
DiffusionMMS segmentation
Depth-to-color alignment
Semantic point cloud generation
│
├── /semantic_pcl/semantic_pcl (PointCloud2 with RGB-encoded labels)
└── /semantic_pcl/camera_pose (TransformStamped)
│
▼
[semantic_voxblox]
Bayesian semantic TSDF integration
3D semantic mesh + ESDF generation
Odometry is provided by ROVIO (/rovio/odometry), fusing IMU and visual data.
| Package | Language | Description |
|---|---|---|
semantic_cloud |
Python | RGB-D semantic segmentation → labeled PointCloud2 |
semantic_voxblox |
C++ | Semantic TSDF volumetric mapping (derived from Voxblox / Kimera-Semantics) |
align_depth_to_color |
Python | Utility node: align depth images to color camera frame |
octomap_generator |
C++ | OctoMap-based semantic mapping (legacy, currently disabled) |
semantic_slam |
— | Meta-package: launch files and configuration parameters |
- ROS Melodic or Noetic
cv_bridge,tf,tf2_ros,message_filterssensor_msgs,geometry_msgs,nav_msgspcl_ros,pcl_conversions
- voxblox (ETH ASL) —
voxblox,voxblox_ros,voxblox_msgs,voxblox_rviz_plugin - catkin_simple
minkindr_conversions,gflags_catkin,glog_catkin- ROVIO (for odometry)
- ORB-SLAM3 +
hector_trajectory_server(TUM RGB-D launch only)
- PyTorch with CUDA
- DiffusionMMS (git submodule at
semantic_cloud/include/diffusionMMS) numpy,opencv-python,numba,omegaconf
git clone --recurse-submodules <repo-url>Or, if already cloned:
git submodule update --init --recursivepip install torch torchvision numpy opencv-python numba omegaconfInstall the DiffusionMMS package:
pip install -e semantic_cloud/include/diffusionMMScd <catkin_ws>
catkin build semantic_voxblox semantic_cloud semantic_slam align_depth_to_color| Parameter | Default | Description |
|---|---|---|
color_image_topic |
/d455/color/image_raw |
RGB image topic |
depth_image_topic |
/d455/depth/image_rect_raw |
Depth image topic |
color_cam_info_topic |
/d455/color/camera_info |
Color camera info topic |
depth_cam_info_topic |
/d455/depth/camera_info |
Depth camera info topic |
camera_pose_topic |
/rovio/odometry |
Odometry source |
enable_semantic |
True |
Enable DiffusionMMS segmentation |
process_sematic_freq |
10 |
Process every N-th frame |
depth_scale |
0.001 |
Depth image scale factor (mm → m) |
selected_semantic |
(38 NYUv2 classes) | Classes to include in the point cloud |
| Parameter | Value | Description |
|---|---|---|
tsdf_voxel_size |
0.03 |
Voxel resolution (3 cm) |
tsdf_voxels_per_side |
16 |
Block size |
max_ray_length_m |
5.0 |
Maximum integration depth |
method |
fast |
TSDF integrator type (fast or merged) |
semantic_measurement_probability |
0.8 |
Bayesian update confidence per observation |
Set in launch file or override at runtime:
<arg name="semseg_model_ckpt" default="<path_to_checkpoint>.pth" />
<arg name="semseg_model_cfg" default="<path_to_config>.yaml" />roslaunch semantic_slam rmf_semantic_voxblox.launchOptionally override model paths:
roslaunch semantic_slam rmf_semantic_voxblox.launch \
semseg_model_ckpt:=/path/to/checkpoint.pth \
semseg_model_cfg:=/path/to/config.yamlroslaunch semantic_slam tum_rgbd.launch bag_file:=/path/to/dataset.bagroslaunch semantic_slam semantic_mapping.launch bag_file:=/path/to/demo.bagroslaunch align_depth_to_color align_depth_to_color.launch| Topic | Type | Publisher | Description |
|---|---|---|---|
/semantic_pcl/semantic_pcl |
sensor_msgs/PointCloud2 |
semantic_cloud | Semantic point cloud (XYZ + RGB-encoded label colors) |
/semantic_pcl/semantic_image |
sensor_msgs/Image |
semantic_cloud | Colorized segmentation visualization |
/semantic_pcl/camera_pose |
geometry_msgs/TransformStamped |
semantic_cloud | Camera pose in world frame |
/tsdf_pointcloud |
sensor_msgs/PointCloud2 |
semantic_voxblox | TSDF map as point cloud |
/surface_mesh |
visualization_msgs/MarkerArray |
semantic_voxblox | Semantic mesh for RViz |
| Topic | Type | Description |
|---|---|---|
/d455/color/image_raw |
sensor_msgs/Image |
RGB image |
/d455/depth/image_rect_raw |
sensor_msgs/Image |
16-bit depth image |
/d455/color/camera_info |
sensor_msgs/CameraInfo |
Color intrinsics |
/d455/depth/camera_info |
sensor_msgs/CameraInfo |
Depth intrinsics |
/rovio/odometry |
nav_msgs/Odometry |
Camera odometry from ROVIO |
The system uses the NYUv2 40-class label set:
wall, floor, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, blinds, desk, shelves, curtain, dresser, pillow, mirror, floor mat, clothes, ceiling, books, refridgerator, television, paper, towel, shower curtain, box, whiteboard, person, night stand, toilet, sink, lamp, bathtub, bag, otherstructure, otherfurniture, otherprop
Label-to-color mappings are defined in semantic_slam/params/nyu.csv.
world
└── map (= mocap)
└── imu
└── base_link
└── camera0 (= d455_depth_optical_frame)
└── d455_color_optical_frame
Static transforms are published by semantic_slam/launch/tf_static.launch.
- semantic_voxblox: Derived from voxblox (ETH ASL, BSD) and Kimera-Semantics (MIT, Antoni Rosinol)
- DiffusionMMS: Diffusion-based multimodal semantic segmentation model (ntnu-arl/diffusionMMS)