IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping
About
Geometry foundation models have significantly advanced dense geometric SLAM, yet existing systems often lack deep semantic understanding and robust loop closure capabilities. Meanwhile, contemporary semantic mapping approaches are frequently hindered by decoupled architectures and fragile data association. We propose IRIS-SLAM, a novel RGB semantic SLAM system that leverages unified geometric-instance representations derived from an instance-extended foundation model. By extending a geometry foundation model to concurrently predict dense geometry and cross-view consistent instance embeddings, we enable a semantic-synergized association mechanism and instance-guided loop closure detection. Our approach effectively utilizes viewpoint-agnostic semantic anchors to bridge the gap between geometric reconstruction and open-vocabulary mapping. Experimental results demonstrate that IRIS-SLAM significantly outperforms state-of-the-art methods, particularly in map consistency and wide-baseline loop closure reliability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Semantic Segmentation | ScanNet (test) | mIoU39.93 | 105 | |
| 3D Semantic Mapping | Replica | mAcc40.63 | 25 | |
| Camera pose estimation | TUM RGB-D 36 | Error (360)0.082 | 9 | |
| Cross-View Loop Closure Detection | ScanNet 0-30° viewpoint difference | Precision84.4 | 4 | |
| Cross-View Loop Closure Detection | ScanNet 30-60° viewpoint difference | Precision29.7 | 4 |