Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

About

A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko S\"underhauf, Ian Reid, Stephen Gould, Anton van den Hengel• 2017

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)52
260
Vision-and-Language NavigationREVERIE (val unseen)
SPL2.84
129
Vision-Language NavigationR2R (test unseen)
SR51
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)62
120
Vision-Language NavigationR2R Unseen (test)
SR69
116
Vision-and-Language NavigationR4R unseen (val)
Success Rate (SR)25.7
52
Vision-and-Language NavigationRoom-to-Room (R2R) Unseen (val)
SR22
52
Vision-and-Language NavigationR2R (val seen)
Success Rate (SR)39
51
NavigationREVERIE Unseen (test)
SR6.88
43
Vision-and-Language NavigationREVERIE Unseen (test)
Success Rate (SR)3.99
40
Showing 10 of 62 rows

Other info

Code

Follow for update