Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robust Navigation with Language Pretraining and Stochastic Sampling

About

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report two simple but highly effective methods to address these challenges and lead to a new state-of-the-art performance. First, we adapt large-scale pretrained language models to learn text representations that generalize better to previously unseen instructions. Second, we propose a stochastic sampling scheme to reduce the considerable gap between the expert actions in training and sampled actions in test, so that the agent can learn to correct its own mistakes during long sequential action decoding. Combining the two techniques, we achieve a new state of the art on the Room-to-Room benchmark with 6% absolute gain over the previous best result (47% -> 53%) on the Success Rate weighted by Path Length metric.

Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi• 2019

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)49
260
Vision-Language NavigationR2R (test unseen)
SR57
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)58
120
Vision-Language NavigationR2R Unseen (test)
SR53
116
Vision-and-Language NavigationRoom-to-Room (R2R) Unseen (val)
SR59
52
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)45
38
Vision-and-Language NavigationRoom-to-Room (R2R) Seen (val)
NE (Navigation Error)3.09
32
Vision-and-Language NavigationRoom-to-Room (R2R) (test unseen)
SR49
24
Showing 8 of 8 rows

Other info

Follow for update