INHerit-SG: Incremental Hierarchical Semantic Scene Graphs with RAG-Style Retrieval
About
Driven by recent advancements in foundation models, semantic scene graphs have emerged as a promising paradigm for high-level 3D environmental abstraction in robot navigation. However, existing frameworks struggle to successfully handle complex embodied queries while ensuring continuous semantic graph construction. To address these limitations, we present INHerit-SG, an asynchronous dual-stream architecture that systematically structures the 3D environment into a RAG-ready knowledge base. Specifically, our framework integrates comprehensive node representations, an event-triggered asynchronous update scheme, and a structured retrieval mechanism. While geometric segmentation is decoupled from semantic reasoning to maintain mapping efficiency, the semantic nodes also store natural language summaries to support text-based retrieval. Furthermore, we propose an interpretable retrieval pipeline that couples the reasoning capabilities of multi-role LLMs with the topological structure of the scene graph, followed by a visual verification process to mitigate false positives. We evaluate INHerit-SG on a newly constructed benchmark for complex embodied semantic query retrieval, HM3DSem-SQR, and in real-world environments. Experiments demonstrate that our system achieves state-of-the-art performance on complex queries, especially for those involving negations and chained spatial constraints. Project Page: https://fangyuktung.github.io/INHeritSG.github.io/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Spatial Question Response (Object Retrieval) | HM3DSem-SQR | Accuracy (1m, ABC)37.7 | 7 | |
| Object Retrieval | OpenLex3D Replica | mAP6.22 | 5 | |
| Object Retrieval | OpenLex3D HM3D | mAP4.5 | 5 | |
| Robotic Object Retrieval | Real-world data | Accuracy (Simple)54.5 | 4 | |
| Object Retrieval | HM3DSem-SQR Basic (Types A, B, C) | QLCR86.27 | 3 | |
| Object Retrieval | HM3DSem-SQR Negation (Type D) | QLCR75.56 | 3 | |
| Object Retrieval | HM3DSem-SQR Chained (Type E) | QLCR72.22 | 3 | |
| Object Retrieval | HM3DSem-SQR Ambiguous (Type F) | QLCR77.78 | 3 | |
| Object Retrieval | HM3DSem SQR (Overall) | QLCR79.67 | 3 | |
| Semantic Mapping | HM3DSem | Semantic Accuracy (Random)70.6 | 1 |