| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | InfoSeek (test) | Accuracy57.9 | 60 | |
| Visual Question Answering | InfoSeek | Accuracy55.93 | 38 | |
| Visual Question Answering | InfoSeek (Full) | Accuracy22.9 | 35 | |
| Knowledge-Intensive Visual Question Answering | InfoSeek (val) | Accuracy (All)44.1 | 30 | |
| Visual Question Answering | InfoSeek (val) | Unseen-Q Accuracy48.3 | 28 | |
| (Image, Text)-to-Text Retrieval | InfoSeek | Recall@570.3 | 20 | |
| Visual Question Answering | InfoSeek | Overall Score44.2 | 15 | |
| Entity Retrieval | InfoSeek (val) | R@164 | 9 | |
| (Image, Text)-to-Multimodal Retrieval | InfoSeek | Recall@548.9 | 8 | |
| Image-text-to-text retrieval | InfoSeek M-BEIR (test) | Recall@559.1 | 8 | |
| Multimodal Retrieval | InfoSeek-8 | Recall@552.38 | 6 | |
| Multimodal Retrieval | InfoSeek 6 | R@542.08 | 6 | |
| Visual Question Answering | InfoSeek | EM24.55 | 6 | |
| Visual Question Answering | Infoseek human (Unseen Question) | Accuracy33.6 | 6 | |
| Visual Question Answering | Infoseek human (Unseen Entity) | Accuracy31.4 | 6 | |
| Knowledge-based Visual Question Answering | INFOSEEK (Overall) | Accuracy33.2 | 5 | |
| Retrieval | Infoseek (test) | P@578.3 | 5 | |
| End-to-end Question Answering | InfoSeek 25-sample perturbed subset | Rotation52 | 4 | |
| Knowledge-based Visual Question Answering | INFOSEEK (Unseen Entity) | Accuracy32.6 | 4 | |
| Knowledge-based Visual Question Answering | INFOSEEK Unseen Question | Accuracy33.8 | 4 | |
| Knowledge-based Visual Question Answering | Infoseek M2KR | Accuracy30.65 | 3 | |
| Knowledge-based Visual Question Answering | Infoseek (test) | Accuracy30.7 | 2 |