Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BEVBert: Multimodal Map Pre-training for Language-guided Navigation

About

Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panoramas to learn visual-textual associations. This requires the model to implicitly correlate incomplete, duplicate observations within the panoramas, which may impair an agent's spatial understanding. Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. This hybrid design can balance the demand of VLN for both short-term reasoning and long-term planning. Then, based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based pre-training route for VLN, and the proposed method achieves state-of-the-art on four VLN benchmarks.

Dong An, Yuankai Qi, Yangguang Li, Yan Huang, Liang Wang, Tieniu Tan, Jing Shao• 2022

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)59.1
266
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)75
260
Vision-Language NavigationRxR-CE (val-unseen)
SR64.4
172
Vision-and-Language NavigationREVERIE (val unseen)
SPL36.4
129
Vision-Language NavigationR2R Unseen (test)
SR73
116
Vision-and-Language NavigationR2R (val seen)
Success Rate (SR)81
51
Vision-and-Language NavigationR2R-CE (test-unseen)
SR59
50
Vision-and-Language NavigationR2R-CE (val-seen)
SR70.9
49
Vision-and-Language NavigationREVERIE Unseen (test)
Success Rate (SR)52.81
40
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)60
38
Showing 10 of 15 rows

Other info

Follow for update