| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | OKVQA | Top-1 Accuracy75.29 | 283 | |
| Visual Question Answering | OKVQA (val) | VQA Score66.1 | 101 | |
| Knowledge-based Visual Question Answering | OKVQA | Accuracy0.661 | 52 | |
| Knowledge-based Visual Question Answering | OKVQA (val) | Accuracy66.7 | 27 | |
| Knowledge-based Visual Retrieval | OKVQA Google Search (test) | PR@584.66 | 16 | |
| Cognition and Reasoning | OKVQA | Score61.92 | 16 | |
| Visual Question Answering | OKVQA (test) | Accuracy65.7 | 11 | |
| Visual Question Answering | OKVQA (I) (test) | VQA Accuracy57.8 | 11 | |
| Visual Question Answering | OKVQA N=200 | Score D61.7 | 11 | |
| Visual Question Answering | OKVQA (N=100) | CD Score56.6 | 11 | |
| Visual Question Answering | OKVQA N=40 | CD48.8 | 11 | |
| Object Hallucination Probing | OKVQA POPE Popular | Accuracy85 | 11 | |
| Visual Question Answering | OKVQA | AUROC0.788 | 9 | |
| Knowledge-based Question Answering | OKVQA | Score64.56 | 9 | |
| Visual Question Answering | OKVQA (val-lite) | Accuracy48.68 | 6 | |
| Knowledge-based Visual Retrieval | OKVQA WK11M (test) | MRR@551.15 | 6 | |
| Knowledge-based Visual Question Answering | OKVQA M2KR | VQA Score0.661 | 6 | |
| Object Hallucination Probing | OKVQA POPE Adversarial | POPE Score (Zh)79.97 | 6 | |
| Object Hallucination Probing | OKVQA POPE Random | Accuracy (Zh)86.03 | 6 | |
| Retrieval | OKVQA (test) | PR@590.9 | 5 | |
| Knowledge-based Visual Question Answering | OKVQA (test) | Accuracy64.3 | 2 |