Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

About

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-and-language navigation (VLN), existing approaches often face a trade-off between reasoning capability and deployment efficiency on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and strong high-level reasoning on real-world robots. The system is decomposed into a fast perception-action layer and a deep reasoning layer running asynchronously at different time scales, with a shared memory layer enabling efficient interaction between them. To support long-horizon reasoning, we incrementally construct a compact memory graph and progressively feed decomposed subgraphs into a vision-language model (VLM). Furthermore, we formulate exploration as a Weighted Traveling Repairman Problem (WTRP) by jointly considering reasoning outcomes and the spatial distribution of candidate regions. Extensive experiments in simulation and real-world environments demonstrate improved navigation success and efficiency over existing VLN approaches while maintaining real-time performance on resource-constrained hardware. Code and additional real-world experiments are available at https://github.com/xukuanHIT/HiCo-Nav.

Kuan Xu, Ruimeng Liu, Yizhuo Yang, Denan Liang, Tongxing Jin, Shenghai Yuan, Chen Wang, Lihua Xie• 2026

Related benchmarks

TaskDatasetResultRank
Object Goal NavigationMP3D
SR48.5
129
Object NavigationHM3D
Success Rate (SR)61
110
Open-set ObjectGoal NavigationHM3D-OVON unseen (val)
SR52.4
49
Text NavigationTextNav
Success Rate (SR)27.8
14
Showing 4 of 4 rows

Other info

Follow for update