LOSC: LiDAR Open-voc Segmentation Consolidator

About

We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins. Code is available at https://github.com/valeoai/LOSC.

Nermin Samet, Gilles Puy, Renaud Marlet• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	nuScenes (val)	mIoU (Segmentation)0.493	323
Semantic segmentation	SemanticKITTI (val)	mIoU35.2	212
Panoptic Segmentation	nuScenes (val)	PQ48.4	56
LiDAR Panoptic Segmentation	SemanticKITTI (val)	PQ32.4	38
Annotation-free closed-set semantic segmentation	nuScenes (val)	mIoU49.3	16
Annotation-free closed-set semantic segmentation	SemanticKITTI (val)	mIoU35.2	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord