| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Meme Abuse Detection | GOAT-Bench 1.0 (test) | Harmfulness Accuracy75.1 | 32 | |
| Multi-Modal Lifelong Navigation | GOAT-Bench unseen (val) | SR62.7 | 22 | |
| Lifelong Visual Navigation | GOAT-Bench 1/10-scale subset (val-unseen) | Success Rate72.8 | 13 | |
| Harmful Meme Detection | GOAT-Bench In-Domain | Racism F188.5 | 11 | |
| Harmful Meme Detection | GOAT-Bench (Out-Of-Domain) | Racism F187.1 | 7 | |
| Multi-modal Lifelong Navigation | GOAT-Bench Seen-Synonyms (val) | SR66.8 | 6 | |
| Multi-modal Lifelong Navigation | GOAT-Bench Seen (val) | SR65.5 | 6 |