Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation

About

Vision-and-Language Navigation in Continuous Environments (VLN-CE), which links language instructions to perception and control in the real world, is a core capability of embodied robots. Recently, large-scale pretrained foundation models have been leveraged as shared priors for perception, reasoning, and action, enabling zero-shot VLN without task-specific training. However, existing zero-shot VLN methods depend on costly perception and passive scene understanding, collapsing control to point-level choices. As a result, they are expensive to deploy, misaligned in action semantics, and short-sighted in planning. To address these issues, we present DreamNav that focuses on the following three aspects: (1) for reducing sensory cost, our EgoView Corrector aligns viewpoints and stabilizes egocentric perception; (2) instead of point-level actions, our Trajectory Predictor favors global trajectory-level planning to better align with instruction semantics; and (3) to enable anticipatory and long-horizon planning, we propose an Imagination Predictor to endow the agent with proactive thinking capability. On VLN-CE and real-world tests, DreamNav sets a new zero-shot state-of-the-art (SOTA), outperforming the strongest egocentric baseline with extra information by up to 7.49\% and 18.15\% in terms of SR and SPL metrics. To our knowledge, this is the first zero-shot VLN method to unify trajectory-level planning and active imagination while using only egocentric inputs.

Yunheng Wang, Yuetong Fang, Taowen Wang, Yixiao Feng, Yawen Tan, Shuning Zhang, Peiran Liu, Yiding Ji, Renjing Xu• 2025

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)32.8
677
Embodied NavigationR2R-CE
Navigation Error (NE)7.06
19
Vision-and-Language NavigationReal-world scenario Outdoor
Success Rate44
12
Vision-and-Language NavigationOffice Real-world scenario
Success Rate44
12
Vision-and-Language NavigationReal-world scenario Gym
Success Rate33
12
Vision-and-Language NavigationReal-world scenario Lobby
Success Rate22
12
Showing 6 of 6 rows

Other info

Follow for update