OVI-MAP:Open-Vocabulary Instance-Semantic Mapping

About

Incremental open-vocabulary 3D instance-semantic mapping is essential for autonomous agents operating in complex everyday environments. However, it remains challenging due to the need for robust instance segmentation, real-time processing, and flexible open-set reasoning. Existing methods often rely on the closed-set assumption or dense per-pixel language fusion, which limits scalability and temporal consistency. We introduce OVI-MAP that decouples instance reconstruction from semantic inference. We propose to build a class-agnostic 3D instance map that is incrementally constructed from RGB-D input, while semantic features are extracted only from a small set of automatically selected views using vision-language models. This design enables stable instance tracking and zero-shot semantic labeling throughout online exploration. Our system operates in real time and outperforms state-of-the-art open-vocabulary mapping baselines on standard benchmarks.

Zilong Deng, Federico Tombari, Marc Pollefeys, Johanna Wald, Daniel Barath• 2026

Related benchmarks

Task	Dataset	Result
3D Semantic Segmentation	Replica	3D mIoU27	61
3D Semantic Segmentation	ScanNet	mIoU17.5	57
3D Instance Segmentation	Replica	AP2534.5	24
Instance Segmentation	ScanNet	mAP@0.524	20
3D Instance Segmentation	ScanNet	Instance mAP@0.515.7	15

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord