| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | GQA | Accuracy83.2 | 1,425 | |
| Visual Question Answering | GQA | Accuracy74.9 | 524 | |
| Visual Question Answering | GQA (test-dev) | Accuracy73.9 | 236 | |
| Visual Question Answering | GQA (test) | Accuracy89.3 | 197 | |
| Visual Question Answering | GQA | Mean Accuracy65.8 | 196 | |
| Visual Question Answering | GQA | Score71.3 | 193 | |
| Performance Estimation | GQA | MAE0 | 184 | |
| Visual Question Answering | GQA | Accuracy77.5 | 155 | |
| Visual Question Answering | GQA | GQA Score64.83 | 139 | |
| Visual Reasoning | GQA | Accuracy64.54 | 93 | |
| Visual Question Answering | GQA (test-std) | Accuracy65.65 | 74 | |
| Visual Question Answering | GQA | GQA Score63.4 | 53 | |
| Multi-modal Vision-Language Understanding | GQA | Accuracy64.2 | 51 | |
| Object Hallucination Probing | GQA POPE Popular | Accuracy86.07 | 49 | |
| Object Hallucination Probing | GQA POPE Random | Accuracy (GQA POPE)89.93 | 42 | |
| Object Hallucination Probing | GQA Adversarial | Accuracy82.73 | 40 | |
| Visual Question Answering | GQA | Accuracy75.77 | 36 | |
| Multi-turn Visual Question Answering | MT-GQA | Acc165.45 | 33 | |
| Visual Question Answering | GQA balanced (test-dev) | Accuracy77.4 | 32 | |
| Visual Question Answering | GQA (val) | Accuracy83.39 | 32 | |
| Visual Question Answering | GQA | Accuracy61.97 | 31 | |
| Visual Question Answering | GQA v1.0 (test) | Accuracy63.3 | 31 | |
| Refusal Rate Evaluation | GQA | Refusal Rate77 | 30 | |
| Visual Question Answering | GQA | Accuracy65.4 | 30 | |
| Visual Question Answering | GQA | Accuracy63.9 | 29 |