| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Vision-Language Navigation | R2R-CE (val-unseen) | Success Rate (SR)81.1 | 677 | |
| Vision-and-Language Navigation | R2R (val unseen) | Success Rate (SR)84 | 448 | |
| Vision-Language Navigation | R2R (val seen) | Success Rate (SR)7,540 | 150 | |
| Vision-Language Navigation | R2R (test unseen) | SR86 | 149 | |
| Vision-Language Navigation | R2R Unseen (test) | SR86 | 144 | |
| Vision-and-Language Navigation | R2R (val seen) | Success Rate (SR)83.74 | 68 | |
| Vision-and-Language Navigation | R2R-CE (test-unseen) | SR66 | 63 | |
| Vision-and-Language Navigation | R2R (test) | SPL (Success weighted Path Length)77 | 51 | |
| Vision-Language Navigation | R2R unseen v1.0 (val) | SR3,130 | 48 | |
| Vision-Language Navigation | R2R 1 (test unseen) | Success Rate0.76 | 29 | |
| Vision-Language Navigation | R2R-Goal (val unseen) | Success Rate (SR)36 | 22 | |
| Embodied Navigation | R2R-CE | Navigation Error (NE)4.73 | 19 | |
| Vision-Language Navigation | R2R VLN-PE (val unseen) | Navigation Error (NE)4.33 | 18 | |
| Vision-Language Navigation | R2R VLN-PE (val seen) | Navigation Error (NE)4.1 | 17 | |
| Vision-Language Navigation | R2R VLN Challenge Leaderboard (test) | PL1,257.38 | 16 | |
| Goal-Conditioned Visual Navigation Instruction Generation | R2R-Goal (test) | BLEU-433 | 13 | |
| Goal-Conditioned Visual Navigation Instruction Generation | R2R-Goal (val (Seen)) | BLEU-436 | 13 | |
| Vision-and-Language Navigation | R2R (Val-U) | SPL66 | 13 | |
| Vision-and-Language Navigation | R2R | Success Rate (SR)43.7 | 12 | |
| Vision-and-Language Navigation | R2R Discrete (val-unseen) | Navigation Error (NE)2.09 | 12 | |
| Instruction Following | R2R unseen (test) | Success Rate (SR)62.2 | 11 | |
| Vision-Language Navigation | R2R 1 (val seen) | Navigation Error (NE)1.67 | 10 | |
| Vision-Language Navigation | R2R Unseen House (val) | Navigation Error (NE)4.83 | 9 | |
| Vision-and-Language Navigation | R2R generalization (unseen) | SR35.2 | 8 | |
| Vision-Language Navigation | R2R-TopDown (val unseen) | Success Rate (SR)47 | 7 |