Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

About

Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN). However, previous pre-training methods for VLN either lack the ability to predict future actions or ignore the trajectory contexts, which are essential for a greedy navigation process. In this work, to promote the learning of spatio-temporal visual-textual correspondence as well as the agent's capability of decision making, we propose a novel history-and-order aware pre-training paradigm (HOP) with VLN-specific objectives that exploit the past observations and support future action prediction. Specifically, in addition to the commonly used Masked Language Modeling (MLM) and Trajectory-Instruction Matching (TIM), we design two proxy tasks to model temporal order information: Trajectory Order Modeling (TOM) and Group Order Modeling (GOM). Moreover, our navigation action prediction is also enhanced by introducing the task of Action Prediction with History (APH), which takes into account the history visual perceptions. Extensive experimental results on four downstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed method compared against several state-of-the-art agents.

Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu• 2022

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)64
260
Vision-and-Language NavigationREVERIE (val unseen)
SPL26.11
129
Vision-Language NavigationR2R (test unseen)
SR64
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)76
120
Vision-Language NavigationR2R Unseen (test)
SR64
116
Vision-and-Language NavigationRoom-to-Room (R2R) Unseen (val)
SR64
52
Vision-and-Language NavigationR2R (val seen)--
51
NavigationREVERIE Unseen (test)
SR30.17
43
Vision-and-Language NavigationREVERIE Unseen (test)
Success Rate (SR)30.17
40
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)59
38
Showing 10 of 30 rows

Other info

Code

Follow for update