IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping

About

Geometry foundation models have significantly advanced dense geometric SLAM, yet existing systems often lack deep semantic understanding and robust loop closure capabilities. Meanwhile, contemporary semantic mapping approaches are frequently hindered by decoupled architectures and fragile data association. We propose IRIS-SLAM, a novel RGB semantic SLAM system that leverages unified geometric-instance representations derived from an instance-extended foundation model. By extending a geometry foundation model to concurrently predict dense geometry and cross-view consistent instance embeddings, we enable a semantic-synergized association mechanism and instance-guided loop closure detection. Our approach effectively utilizes viewpoint-agnostic semantic anchors to bridge the gap between geometric reconstruction and open-vocabulary mapping. Experimental results demonstrate that IRIS-SLAM significantly outperforms state-of-the-art methods, particularly in map consistency and wide-baseline loop closure reliability.

Tingyang Xiao, Liu Liu, Wei Feng, Zhengyu Zou, Xiaolin Zhou, Wei Sui, Hao Li, Dingwen Zhang, Zhizhong Su• 2026

Related benchmarks

Task	Dataset	Result
3D Semantic Segmentation	ScanNet (test)	mIoU39.93	117
3D Semantic Mapping	Replica	mAcc40.63	34
Camera pose estimation	TUM RGB-D 36	Error (desk)0.026	26
Cross-View Loop Closure Detection	ScanNet 0-30° viewpoint difference	Precision84.4	4
Cross-View Loop Closure Detection	ScanNet 30-60° viewpoint difference	Precision29.7	4

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord