LOSC: LiDAR Open-voc Segmentation Consolidator
About
We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins. Code is available at https://github.com/valeoai/LOSC.
Nermin Samet, Gilles Puy, Renaud Marlet• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | nuScenes (val) | mIoU (Segmentation)0.493 | 265 | |
| Semantic segmentation | SemanticKITTI (val) | mIoU35.2 | 174 | |
| Panoptic Segmentation | nuScenes (val) | PQ48.4 | 56 | |
| LiDAR Panoptic Segmentation | SemanticKITTI (val) | PQ32.4 | 38 | |
| Annotation-free closed-set semantic segmentation | nuScenes (val) | mIoU49.3 | 16 | |
| Annotation-free closed-set semantic segmentation | SemanticKITTI (val) | mIoU35.2 | 6 |
Showing 6 of 6 rows