| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | GQA | Accuracy81.9 | 1,249 | |
| Visual Question Answering | GQA | Accuracy74.9 | 505 | |
| Visual Question Answering | GQA | Mean Accuracy65.8 | 196 | |
| Visual Question Answering | GQA | Score71.3 | 193 | |
| Visual Question Answering | GQA (test) | Accuracy89.3 | 188 | |
| Visual Question Answering | GQA (test-dev) | Accuracy72.1 | 184 | |
| Visual Reasoning | GQA | Accuracy64.54 | 93 | |
| Visual Question Answering | GQA | GQA Score64.83 | 85 | |
| Visual Question Answering | GQA (test-std) | Accuracy65.65 | 68 | |
| Object Hallucination Probing | GQA POPE Popular | Accuracy86.07 | 49 | |
| Object Hallucination Probing | GQA POPE Random | Accuracy (GQA POPE)89.93 | 42 | |
| Object Hallucination Probing | GQA Adversarial | Accuracy82.73 | 40 | |
| Visual Question Answering | GQA | GQA Score63.4 | 37 | |
| Visual Question Answering | GQA | Accuracy75.77 | 36 | |
| Multi-modal Vision-Language Understanding | GQA | Accuracy63.4 | 36 | |
| Multi-turn Visual Question Answering | MT-GQA | Acc165.45 | 33 | |
| Visual Question Answering | GQA balanced (test-dev) | Accuracy77.4 | 32 | |
| Visual Question Answering | GQA (val) | Accuracy83.39 | 32 | |
| Visual Question Answering | GQA v1.0 (test) | Accuracy63.3 | 31 | |
| Refusal Rate Evaluation | GQA | Refusal Rate77 | 30 | |
| Visual Question Answering | GQA | Accuracy65.4 | 30 | |
| Visual Question Answering | GQA | Accuracy63.9 | 29 | |
| Object Hallucination Evaluation | GQA (Random) | Accuracy89.5 | 28 | |
| Visual Question Answering | GQA v1.2 (test) | GQA Score61.9 | 28 | |
| Visual Question Answering | GQA | ECE6.09 | 27 |