MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

About

LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDAR-only baseline. Our code is publicly available at \url{https://github.com/jialeli1/lidarseg3d}.

Jiale Li, Hang Dai, Hao Han, Yong Ding• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	nuScenes (val)	mIoU (Segmentation)0.8	323
Semantic segmentation	SemanticKITTI (val)	mIoU69	212
Semantic segmentation	nuScenes (test)	mIoU81.1	81
3D Semantic Segmentation	SemanticKITTI (val)	mIoU66.7	75
Semantic segmentation	Waymo Open Dataset (val)	mIoU69.6	63
3D Semantic Segmentation	nuScenes (test)	mIoU81.14	56
3D Semantic Segmentation	Waymo (test)	mIoU70.51	12
Point Cloud Semantic Segmentation	Semantic-KITTI (test)	mIoU70.4	9
Point Cloud Semantic Segmentation	TrafficScene	mIoU90	9

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord