| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Meme Abuse Detection | GOAT-Bench 1.0 (test) | Harmfulness Accuracy75.1 | 32 | |
| Multi-Modal Lifelong Navigation | GOAT-Bench unseen (val) | SR62.7 | 22 | |
| Lifelong Visual Navigation | GOAT-Bench 1/10-scale subset (val-unseen) | Success Rate72.8 | 13 | |
| Multi-modal Lifelong Navigation | GOAT-Bench Seen (val) | SPL49 | 13 | |
| Goal-conditioned navigation | GOAT-Bench | SR73.7 | 12 | |
| Harmful Meme Detection | GOAT-Bench In-Domain | Racism F188.5 | 11 | |
| Embodied Navigation | GOAT-Bench unseen (val) | Success Rate (SR)71.4 | 10 | |
| Lifelong Multimodal Object Navigation | GOAT-Bench unseen (val) | s-SR0.465 | 10 | |
| Subtask Navigation | GOAT-Bench unseen (val) | SR62.7 | 9 | |
| Harmful Meme Detection | GOAT-Bench (Out-Of-Domain) | Racism F187.1 | 7 | |
| Visual Navigation | Full GOAT-Bench Unseen (val) | SR54.3 | 6 | |
| Visual Navigation | Full GOAT-Bench Synonyms (val) | Success Rate (SR)58.4 | 6 | |
| Visual Navigation | Full GOAT-Bench Seen (val) | SR56.7 | 6 | |
| Multi-modal Lifelong Navigation | GOAT-Bench Seen-Synonyms (val) | SR66.8 | 6 | |
| Lifelong Navigation | GOAT-Bench evaluation 278 subtasks | SR72.41 | 4 | |
| Lifelong Navigation | GOAT-Bench full 2,780 subtasks (val) | Success Rate (SR)64.14 | 4 | |
| Object Navigation | GOAT-Bench unseen (val) | SR (%)45 | 4 |