Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

About

Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During a one-time exploration, LagMemo constructs a unified 3D language memory with robust spatial-semantic correlations. With incoming task goals, the system efficiently queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary localization, and significantly outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo

Haotian Zhou, Xiaole Wang, He Li, Zhuo Qi, Jinrun Yin, Haiyu Kong, Jianghuan Xu, Huijing Zhao• 2025

Related benchmarks

TaskDatasetResultRank
3D Object LocalizationGOAT-Core Scene 4ok
Average SR@586.7
6
Visual NavigationGOAT-Core (val)
SR56.3
6
Visual NavigationFull GOAT-Bench Seen (val)
SR36.8
6
Visual NavigationFull GOAT-Bench Synonyms (val)
Success Rate (SR)44.8
6
Visual NavigationFull GOAT-Bench Unseen (val)
SR37.9
6
3D Object LocalizationGOAT-Core Scene 5cd
Average SR@565
6
3D Object LocalizationGOAT-Core (Total)
Average SR@570.8
6
3D Object LocalizationGOAT-Core Scene Nfv
Average SR@566.7
6
3D Object LocalizationGOAT-Core Scene Tee
Average SR@565
6
Memory Building and QueryPhysical indoor environment 200 m2 scene
Build Time (s)4.20e+3
3
Showing 10 of 11 rows

Other info

Follow for update