Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OVI-MAP:Open-Vocabulary Instance-Semantic Mapping

About

Incremental open-vocabulary 3D instance-semantic mapping is essential for autonomous agents operating in complex everyday environments. However, it remains challenging due to the need for robust instance segmentation, real-time processing, and flexible open-set reasoning. Existing methods often rely on the closed-set assumption or dense per-pixel language fusion, which limits scalability and temporal consistency. We introduce OVI-MAP that decouples instance reconstruction from semantic inference. We propose to build a class-agnostic 3D instance map that is incrementally constructed from RGB-D input, while semantic features are extracted only from a small set of automatically selected views using vision-language models. This design enables stable instance tracking and zero-shot semantic labeling throughout online exploration. Our system operates in real time and outperforms state-of-the-art open-vocabulary mapping baselines on standard benchmarks.

Zilong Deng, Federico Tombari, Marc Pollefeys, Johanna Wald, Daniel Barath• 2026

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet
mIoU17.5
51
3D Semantic SegmentationReplica
3D mIoU27
41
3D Instance SegmentationReplica
AP2534.5
24
Instance SegmentationScanNet
mAP@0.524
20
3D Instance SegmentationScanNet
Instance mAP@0.515.7
15
Showing 5 of 5 rows

Other info

Follow for update