| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | A-OKVQA | Acc92.68 | 228 | |
| Visual Question Answering | A-OKVQA (test) | Accuracy90.56 | 103 | |
| Visual Question Answering | A-OKVQA (val) | Accuracy79.5 | 92 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Popular) | Accuracy90.3 | 76 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Random) | Accuracy92.1 | 60 | |
| Multi-choice Visual Question Answering | A-OKVQA | Accuracy82.71 | 49 | |
| VLM Editing | A-OKVQA 2022 (test) | Accuracy100 | 48 | |
| Object Hallucination Assessment | A-OKVQA POPE (Adversarial) | Accuracy0.8363 | 42 | |
| Visual Reasoning | A-OKVQA | ECE5.4 | 32 | |
| Object Hallucination | A-OKVQA POPE (test) | Accuracy (Random)90.13 | 29 | |
| Visual Question Answering (Multi-choice) | A-OKVQA (test) | Accuracy87.2 | 28 | |
| Object Hallucination Probing | A-OKVQA (Adversarial split) | Accuracy79.1 | 27 | |
| Direct Answer Visual Question Answering | A-OKVQA (test) | Accuracy69 | 22 | |
| Object Hallucination Evaluation | A-OKVQA POPE | Random Accuracy92.37 | 21 | |
| Direct-answer Visual Question Answering | A-OKVQA | Accuracy68.7 | 18 | |
| Visual Question Answering | A-OKVQA POPE Evaluation (Adversarial) | Accuracy82 | 16 | |
| Visual Question Answering | A-OKVQA POPE (Popular) | Accuracy89.77 | 16 | |
| Visual Question Answering | A-OKVQA POPE Evaluation (Random) | Accuracy90.03 | 16 | |
| Hallucination Evaluation | A-OKVQA | Accuracy (Random)93.76 | 15 | |
| Visual Question Answering | A-OKVQA Open-Ended | Accuracy72.14 | 15 | |
| Visual Question Answering | A-OKVQA v1.0 (test) | Accuracy53.36 | 14 | |
| Object Hallucination Probing | A-OKVQA (Random split) | Accuracy90.83 | 12 | |
| Direct-Answer | A-OKVQA 1.0 (test) | Accuracy68 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE (Adversarial) | Accuracy81.94 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE Popular | Accuracy0.8813 | 12 |