Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

About

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method.

Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang• 2024

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)61
266
Vision-Language NavigationRxR-CE (val-unseen)
SR56.4
172
Vision-and-Language NavigationR2R-CE (test-unseen)
SR58
50
Vision-and-Language NavigationR2R-CE (val-seen)
SR69
49
Vision-and-Language NavigationRxR seen (val)
SR63.72
21
Vision-and-Language NavigationR2R-CE v1.0 (val unseen)
NE (Navigation Error)4.42
19
Iterative Vision-and-Language NavigationIR2R-CE (val seen)
TL10.8
15
Vision-and-Language NavigationRxR-CE seen (val)
NE4.85
13
Vision-and-Language NavigationIR2R-CE (val-unseen)
TL (Task Length Success Rate)9.3
9
Fine-grained Vision-Language NavigationVLNVerse (test)
TL7.98
9
Showing 10 of 18 rows

Other info

Code

Follow for update