Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments

About

Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to execute sequential navigation actions in complex environments guided by natural language instructions. Current approaches often struggle with generalizing to novel environments and adapting to ongoing changes during navigation. Inspired by human cognition, we present NavMorph, a self-evolving world model framework that enhances environmental understanding and decision-making in VLN-CE tasks. NavMorph employs compact latent representations to model environmental dynamics, equipping agents with foresight for adaptive planning and policy refinement. By integrating a novel Contextual Evolution Memory, NavMorph leverages scene-contextual information to support effective navigation while maintaining online adaptability. Extensive experiments demonstrate that our method achieves notable performance improvements on popular VLN-CE benchmarks. Code is available at https://github.com/Feliciaxyao/NavMorph.

Xuan Yao, Junyu Gao, Changsheng Xu• 2025

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)47.9
433
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)47.9
344
Vision-Language NavigationRxR-CE (val-unseen)
SR30.8
280
Vision-and-Language NavigationR2R-CE unseen continuous (val)
SR47.9
35
Vision-Language NavigationRxR (val-unseen)
Navigation Error (NE)8.85
25
Showing 5 of 5 rows

Other info

Follow for update