Topological Planning with Transformers for Vision-and-Language Navigation
About
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel V\'azquez, Silvio Savarese• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Vision-Language Navigation | R2R-CE (val-unseen) | Success Rate (SR)26.4 | 266 | |
| Vision-and-Language Navigation | R2R-CE (val-seen) | SR36 | 49 | |
| Vision-and-Language Navigation | VLN-CE 1.0 (val-seen) | Navigation Error (NE)6.6 | 20 | |
| Vision-and-Language Navigation | VLN-CE 1.0 (val-unseen) | Navigation Error (NE)7.9 | 20 |
Showing 4 of 4 rows