MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
About
3D semantic segmentation on multi-scan large-scale point clouds plays an important role in autonomous systems. Unlike the single-scan-based semantic segmentation task, this task requires distinguishing the motion states of points in addition to their semantic categories. However, methods designed for single-scan-based segmentation tasks perform poorly on the multi-scan task due to the lacking of an effective way to integrate temporal information. We propose MarS3D, a plug-and-play motion-aware module for semantic segmentation on multi-scan 3D point clouds. This module can be flexibly combined with single-scan models to allow them to have multi-scan perception abilities. The model encompasses two key designs: the Cross-Frame Feature Embedding module for enriching representation learning and the Motion-Aware Feature Learning module for enhancing motion awareness. Extensive experiments show that MarS3D can improve the performance of the baseline model by a large margin. The code is available at https://github.com/CVMI-Lab/MarS3D.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | SemanticKITTI v1.0 (test) | mIoU61.7 | 71 | |
| Semantic segmentation | NuScenes v1.0 (test) | mIoU72.8 | 44 | |
| Semantic segmentation | SemanticKITTI multiple scans (test) | mIoU52.7 | 20 | |
| Spatio-temporal Driving Scene Interpolation | Waymo Open Dataset | PSNR20.69 | 12 | |
| Spatio-temporal Driving Scene Reconstruction | Waymo Open Dataset | PSNR21.81 | 12 |