| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | InfoSeek (test) | Accuracy57.9 | 81 | |
| Visual Question Answering | InfoSeek | Accuracy69 | 64 | |
| Knowledge-Intensive Visual Question Answering | InfoSeek (val) | Accuracy (All)44.1 | 50 | |
| Visual Question Answering | InfoSeek (val) | Overall Accuracy47.2 | 38 | |
| Visual Question Answering | InfoSeek (Full) | Accuracy22.9 | 35 | |
| Visual Question Answering | InfoSeek | Overall Score44.2 | 30 | |
| Multi-hop information-seeking | InfoSeek-Eval ID | Success Rate (SR)82 | 24 | |
| Visual Question Answering | InfoSeek | F1 Recall43.6 | 22 | |
| (Image, Text)-to-Text Retrieval | InfoSeek | Recall@570.3 | 20 | |
| Knowledge-based Visual Question Answering | INFOSEEK (Unseen Entity) | Accuracy51 | 19 | |
| Knowledge-based Visual Question Answering | INFOSEEK Unseen Question | Accuracy46.5 | 19 | |
| Knowledge-based VQA | InfoSeek | Unseen-Q Performance42.49 | 18 | |
| Information Seeking Question Answering | InfoSeek | Accuracy73.9 | 17 | |
| Knowledge-Based Visual Question Answering | InfoSeek All | Accuracy49.9 | 16 | |
| Retrieval | InfoSeek | Recall@159.6 | 12 | |
| Re-ranking | InfoSeek | R@166.5 | 11 | |
| Retrieval | InfoSeek standard (val) | Recall@167 | 10 | |
| Entity Retrieval | InfoSeek (val) | R@164 | 9 | |
| Visual Question Answering | InfoSeek | Accuracy34.6 | 8 | |
| (Image, Text)-to-Multimodal Retrieval | InfoSeek | Recall@548.9 | 8 | |
| Image-text-to-text retrieval | InfoSeek M-BEIR (test) | Recall@559.1 | 8 | |
| Knowledge-based Visual Question Answering | Infoseek M2KR | Accuracy50.5 | 7 | |
| Knowledge Retrieval | InfoSeek (val) | Recall@2086.4 | 6 | |
| Open-domain visual recognition | INFOSEEK Overall | Top-1 Accuracy62.7 | 6 | |
| Multimodal Retrieval | InfoSeek-8 | Recall@552.38 | 6 |