Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

About

LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDAR-only baseline. Our code is publicly available at \url{https://github.com/jialeli1/lidarseg3d}.

Jiale Li, Hang Dai, Hao Han, Yong Ding• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationnuScenes (val)
mIoU (Segmentation)0.8
212
Semantic segmentationSemanticKITTI (val)
mIoU66.7
117
Semantic segmentationnuScenes (test)
mIoU81.1
75
Semantic segmentationWaymo Open Dataset (val)
mIoU69.6
63
3D Semantic SegmentationSemanticKITTI (val)
mIoU66.7
54
3D Semantic SegmentationnuScenes (test)
mIoU81.14
36
3D Semantic SegmentationWaymo (test)
mIoU70.51
9
Showing 7 of 7 rows

Other info

Code

Follow for update