Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation
About
There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhorse of large-scale non-parametric knowledge; however, existing techniques do not directly transfer to the embodied domain, which is multimodal, where data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries across kilometer-level environments, highlighting its promise as a general-purpose non-parametric system for embodied agents.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Retrieval | WholeHouse-MM (test) | Target Object Recall@515.1 | 9 | |
| Spatial Question Response (Object Retrieval) | HM3DSem-SQR | Accuracy (1m, ABC)30.13 | 7 | |
| Robotic Object Retrieval | Real-world data | Accuracy (Simple)27.3 | 4 | |
| Object Retrieval | HM3DSem-SQR Negation (Type D) | QLCR60 | 3 | |
| Object Retrieval | HM3DSem-SQR Ambiguous (Type F) | QLCR75 | 3 | |
| Object Retrieval | HM3DSem-SQR Basic (Types A, B, C) | QLCR60.78 | 3 | |
| Object Retrieval | HM3DSem-SQR Chained (Type E) | QLCR66.67 | 3 | |
| Object Retrieval | HM3DSem SQR (Overall) | QLCR62.18 | 3 |