| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Vision-Language Navigation | R2R-CE (val-unseen) | Success Rate (SR)68 | 266 | |
| Vision-and-Language Navigation | R2R (val unseen) | Success Rate (SR)81 | 260 | |
| Vision-Language Navigation | R2R (test unseen) | SR86 | 122 | |
| Vision-Language Navigation | R2R (val seen) | Success Rate (SR)7,540 | 120 | |
| Vision-Language Navigation | R2R Unseen (test) | SR86 | 116 | |
| Vision-and-Language Navigation | R2R (val seen) | Success Rate (SR)83.74 | 51 | |
| Vision-and-Language Navigation | R2R-CE (test-unseen) | SR66 | 50 | |
| Vision-and-Language Navigation | R2R (test) | SPL (Success weighted Path Length)76 | 38 | |
| Vision-Language Navigation | R2R unseen v1.0 (val) | SR3,130 | 24 | |
| Embodied Navigation | R2R-CE | Navigation Error (NE)4.73 | 19 | |
| Vision-Language Navigation | R2R 1 (test unseen) | Success Rate0.76 | 18 | |
| Vision-Language Navigation | R2R VLN Challenge Leaderboard (test) | PL1,257.38 | 16 | |
| Vision-and-Language Navigation | R2R Discrete (val-unseen) | Navigation Error (NE)2.09 | 12 | |
| Instruction Following | R2R unseen (test) | Success Rate (SR)62.2 | 11 | |
| Vision-Language Navigation | R2R Unseen House (val) | Navigation Error (NE)4.83 | 9 | |
| Vision-and-Language Navigation | R2R generalization (unseen) | SR35.2 | 8 | |
| Vision-Language Navigation | R2R VLN-PE (val unseen) | Trajectory Length (TL)6.58 | 7 | |
| Vision-Language Navigation | R2R VLN-PE (val seen) | Trajectory Length (TL)6.62 | 7 | |
| Instruction Generation | R2R (test) | SR76 | 7 | |
| Human Wayfinding | R2R (val-unseen) | WC24.5 | 6 | |
| Vision-and-Language Navigation | R2R unseen complementary (val) | Path Length (PL)7.8 | 6 | |
| Room-to-Room Navigation | R2R 72 scenes | NE5.02 | 5 | |
| Instruction Generation | R2R unseen (val) | BLEU-10.708 | 5 | |
| Instruction Generation | R2R (val seen) | BLEU-10.728 | 5 | |
| Vision-Language Navigation | R2R (seen) | Navigation Error (NE)3.84 | 4 |