LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
About
Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During a one-time exploration, LagMemo constructs a unified 3D language memory with robust spatial-semantic correlations. With incoming task goals, the system efficiently queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary localization, and significantly outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Localization | GOAT-Core Scene 4ok | Average SR@586.7 | 6 | |
| Visual Navigation | GOAT-Core (val) | SR56.3 | 6 | |
| Visual Navigation | Full GOAT-Bench Seen (val) | SR36.8 | 6 | |
| Visual Navigation | Full GOAT-Bench Synonyms (val) | Success Rate (SR)44.8 | 6 | |
| Visual Navigation | Full GOAT-Bench Unseen (val) | SR37.9 | 6 | |
| 3D Object Localization | GOAT-Core Scene 5cd | Average SR@565 | 6 | |
| 3D Object Localization | GOAT-Core (Total) | Average SR@570.8 | 6 | |
| 3D Object Localization | GOAT-Core Scene Nfv | Average SR@566.7 | 6 | |
| 3D Object Localization | GOAT-Core Scene Tee | Average SR@565 | 6 | |
| Memory Building and Query | Physical indoor environment 200 m2 scene | Build Time (s)4.20e+3 | 3 |