Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory

About

Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) constructs a task-agnostic, multimodal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable Task Conditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim, emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust long horizon reasoning, scalability, and practical utility. Project Website: https://trailab.github.io/STaR-website/

Mingfeng Yuan, Hao Zhang, Mahan Mohammadi, Runhao Li, Jinjun Shan, Steven L. Waslander• 2026

Related benchmarks

TaskDatasetResultRank
Multi-modal ReasoningWH-VQA
SR64
3
SpatialNaVQA Short memory horizon
SR89
3
SpatialNaVQA Medium memory horizon
SR84
3
SpatialNaVQA Long memory horizon
SR77
3
Spatial ReasoningWH-VQA
SR67
3
TemporalNaVQA Medium memory horizon
SR83
3
TextualNaVQA Short memory horizon
SR82
3
TextualNaVQA Medium memory horizon
SR84
3
TextualNaVQA Long memory horizon
SR80
3
Textual ReasoningWH-VQA
SR63
3
Showing 10 of 12 rows

Other info

Follow for update