Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation

About

There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhorse of large-scale non-parametric knowledge; however, existing techniques do not directly transfer to the embodied domain, which is multimodal, where data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries across kilometer-level environments, highlighting its promise as a general-purpose non-parametric system for embodied agents.

Quanting Xie, So Yeon Min, Pengliang Ji, Yue Yang, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, Yonatan Bisk• 2024

Related benchmarks

TaskDatasetResultRank
Multimodal RetrievalWholeHouse-MM (test)
Target Object Recall@515.1
9
Spatial Question Response (Object Retrieval)HM3DSem-SQR
Accuracy (1m, ABC)30.13
7
Robotic Object RetrievalReal-world data
Accuracy (Simple)27.3
4
Object RetrievalHM3DSem-SQR Negation (Type D)
QLCR60
3
Object RetrievalHM3DSem-SQR Ambiguous (Type F)
QLCR75
3
Object RetrievalHM3DSem-SQR Basic (Types A, B, C)
QLCR60.78
3
Object RetrievalHM3DSem-SQR Chained (Type E)
QLCR66.67
3
Object RetrievalHM3DSem SQR (Overall)
QLCR62.18
3
Showing 8 of 8 rows

Other info

Follow for update