Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas

About

Semantic segmentation across visual modalities such as 3D point clouds and panoramic images remains a challenging task, primarily due to the scarcity of annotated data and the limited adaptability of fixed-label models. In this paper, we present JOPP-3D, an open-vocabulary semantic segmentation framework that jointly leverages panoramic and point cloud data to enable language-driven scene understanding. We convert RGB-D panoramic images into their corresponding tangential perspective images and 3D point clouds, then use these modalities to extract and align foundational vision-language features. This allows natural language querying to generate semantic masks on both input modalities. Experimental evaluation on the Stanford-2D-3D-s and ToF-360 datasets demonstrates the capability of JOPP-3D to produce coherent and semantically meaningful segmentations across panoramic and 3D domains. Our proposed method achieves a significant improvement compared to the SOTA in open and closed vocabulary 2D and 3D semantic segmentation.

Sandeep Inuganti, Hideaki Kanayama, Kanta Shimizu, Mahdi Chamseddine, Soichiro Yokota, Didier Stricker, Jason Rambach• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationStanford2D3D Panoramic 1.0 (Fold-1)
mIoU70.1
53
Semantic segmentationS3DIS (Area 2)
mIoU80.9
15
Panoramic Semantic SegmentationToF-360 (test)
mIoU30.7
3
Panoramic Semantic SegmentationToF-360
mIoU30.7
3
Point Cloud Semantic SegmentationToF-360
mIoU30.9
3
Showing 5 of 5 rows

Other info

Follow for update