Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

About

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent's location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.

Sonia Raychaudhuri, Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang• 2021

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)35
433
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)35
344
Vision-Language NavigationRxR-CE (val-unseen)
SR8
280
Vision-and-Language NavigationR2R-CE (val-seen)
SR37
49
Vision-and-Language NavigationR2R-CE unseen continuous (val)
SR35
35
Vision-Language NavigationRxR (val-unseen)
Navigation Error (NE)10.9
25
Vision-Language NavigationVLN-CE R2R (val unseen)
Navigation Error (NE)6.83
22
Vision-and-Language NavigationVLN-CE 1.0 (val-unseen)
Navigation Error (NE)6.83
20
Vision-and-Language NavigationVLN-CE 1.0 (val-seen)
Navigation Error (NE)6.35
20
Vision-and-Language NavigationRoom-to-Room (R2R) VLN-CE (val unseen)
Navigation Error (NE)6.83
17
Showing 10 of 12 rows

Other info

Follow for update