| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | A-OKVQA | Acc92.68 | 202 | |
| Visual Question Answering | A-OKVQA (val) | Accuracy0.879 | 88 | |
| Visual Question Answering | A-OKVQA (test) | Accuracy89.17 | 88 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Popular) | Accuracy90.3 | 52 | |
| Multi-choice Visual Question Answering | A-OKVQA | Accuracy82.71 | 49 | |
| VLM Editing | A-OKVQA 2022 (test) | Accuracy100 | 48 | |
| Object Hallucination Evaluation | A-OKVQA POPE (Random) | Accuracy89.5 | 36 | |
| Object Hallucination | A-OKVQA POPE (test) | Accuracy (Random)90.13 | 29 | |
| Visual Question Answering (Multi-choice) | A-OKVQA (test) | Accuracy87.2 | 28 | |
| Object Hallucination Probing | A-OKVQA (Adversarial split) | Accuracy79.1 | 27 | |
| Direct Answer Visual Question Answering | A-OKVQA (test) | Accuracy69 | 22 | |
| Object Hallucination Evaluation | A-OKVQA POPE | Random Accuracy92.37 | 21 | |
| Object Hallucination Assessment | A-OKVQA POPE (Adversarial) | Accuracy0.8126 | 18 | |
| Direct-answer Visual Question Answering | A-OKVQA | Accuracy68.7 | 18 | |
| Visual Question Answering | A-OKVQA POPE Evaluation (Adversarial) | Accuracy82 | 16 | |
| Visual Question Answering | A-OKVQA POPE (Popular) | Accuracy89.77 | 16 | |
| Visual Question Answering | A-OKVQA POPE Evaluation (Random) | Accuracy90.03 | 16 | |
| Hallucination Evaluation | A-OKVQA | Accuracy (Random)93.76 | 15 | |
| Visual Question Answering | A-OKVQA Open-Ended | Accuracy72.14 | 15 | |
| Visual Question Answering | A-OKVQA v1.0 (test) | Accuracy53.36 | 14 | |
| Object Hallucination Probing | A-OKVQA (Random split) | Accuracy90.83 | 12 | |
| Direct-Answer | A-OKVQA 1.0 (test) | Accuracy68 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE (Adversarial) | Accuracy81.94 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE Popular | Accuracy0.8813 | 12 | |
| Polling-based Object Probing Evaluation (POPE) | A-OKVQA POPE Random | Accuracy89.6 | 12 |