| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | A-OKVQA | Acc92.68 | 175 | |
| Visual Question Answering | A-OKVQA (test) | Accuracy85.7 | 79 | |
| Visual Question Answering | A-OKVQA (val) | Accuracy0.81 | 56 | |
| Multi-choice Visual Question Answering | A-OKVQA | Accuracy82.71 | 49 | |
| VLM Editing | A-OKVQA 2022 (test) | Accuracy100 | 48 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Popular) | Accuracy87.71 | 36 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Random) | Accuracy89.5 | 36 | |
| Object Hallucination Probing | A-OKVQA (Adversarial split) | Accuracy79.1 | 27 | |
| Visual Question Answering (Multi-choice) | A-OKVQA (test) | Accuracy80.2 | 19 | |
| Object Hallucination Assessment | A-OKVQA POPE (Adversarial) | Accuracy0.8126 | 18 | |
| Direct-answer Visual Question Answering | A-OKVQA | Accuracy68.7 | 18 | |
| Visual Question Answering | A-OKVQA Open-Ended | Accuracy72.14 | 15 | |
| Visual Question Answering | A-OKVQA v1.0 (test) | Accuracy53.36 | 14 | |
| Object Hallucination Probing | A-OKVQA (Random split) | Accuracy90.83 | 12 | |
| Direct-Answer | A-OKVQA 1.0 (test) | Accuracy68 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE (Adversarial) | Accuracy81.94 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE Popular | Accuracy0.8813 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE Random | Accuracy89.6 | 12 | |
| Speech-Visual Question Answering | A-OKVQA Speech-converted | Accuracy0.2001 | 12 | |
| Direct-Answer | A-OKVQA 1.0 (val) | Accuracy0.683 | 11 | |
| Multiple-Choice | A-OKVQA 1.0 (test) | Accuracy86.7 | 9 | |
| Multiple-Choice | A-OKVQA 1.0 (val) | Accuracy87.7 | 9 | |
| Object Hallucination | A-OKVQA POPE (test) | Accuracy (Random)90.13 | 8 | |
| Knowledge-based Visual Question Answering | A-OKVQA (val) | Accuracy68.9 | 8 | |
| Direct Answer Visual Question Answering | A-OKVQA (test) | Accuracy59.6 | 7 |