| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | GQA | Accuracy81.9 | 963 | |
| Visual Question Answering | GQA | Accuracy71.6 | 374 | |
| Visual Question Answering | GQA (test-dev) | Accuracy72.1 | 178 | |
| Visual Question Answering | GQA (test) | Accuracy89.3 | 119 | |
| Visual Question Answering | GQA (test-std) | Accuracy65.65 | 62 | |
| Visual Question Answering | GQA | Mean Accuracy64.4 | 49 | |
| Visual Question Answering | GQA | Score67.2 | 47 | |
| Visual Question Answering | GQA | Accuracy75.77 | 36 | |
| Object Hallucination Probing | GQA POPE Popular | Accuracy84.83 | 33 | |
| Visual Question Answering | GQA balanced (test-dev) | Accuracy77.4 | 32 | |
| Visual Question Answering | GQA v1.2 (test) | GQA Score61.9 | 28 | |
| Visual Question Answering | GQA | ECE6.09 | 27 | |
| Visual Question Answering | GQA | Clean Accuracy60.3 | 27 | |
| Visual Question Answering | GQA | Accuracy61.9 | 26 | |
| Object Hallucination Probing | GQA POPE Random | Accuracy (GQA POPE)89.93 | 26 | |
| Object Hallucination Probing | GQA Adversarial | Accuracy81.76 | 24 | |
| Visual Question Answering | GQA | Accuracy65.4 | 22 | |
| Visual Question Answering | GQA (val) | Accuracy77 | 22 | |
| Scene Graph Generation | GQA-200 (test) | R@5026.1 | 20 | |
| Visual Reasoning | GQA (test-dev) | Accuracy62 | 19 | |
| Compositional Question Answering | GQA | Exact Accuracy73.4 | 17 | |
| Visual Question Answering | GQA 22 | Accuracy65.3 | 17 | |
| Visual Question Answering | GQA | Accuracy62.9 | 16 | |
| Visual Question Answering | GQA v1.0 (test) | Accuracy61.9 | 16 | |
| Cognition and Reasoning | GQA | Score0.6226 | 16 |