| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | Accuracy94.42 | 935 | |
| Object Hallucination | POPE (Random) | F1 Score93.02 | 200 | |
| Object Hallucination | POPE Adversarial | Accuracy90 | 196 | |
| Object Hallucination | POPE Popular | F1 Score91.4 | 188 | |
| Hallucination Evaluation | POPE | Accuracy94.42 | 132 | |
| Object Hallucination Evaluation | POPE Adversarial offline | F1 Score68.96 | 84 | |
| Object Hallucination Evaluation | POPE Popular offline | F1 Score84.43 | 84 | |
| Object Hallucination Evaluation | POPE Random offline | F1 Score73.6 | 84 | |
| Visual Question Answering | POPE | Accuracy88.5 | 71 | |
| Transfer Attack | POPE (test) | CAE0.2477 | 69 | |
| Object Hallucination Evaluation | POPE (test) | Accuracy90.6 | 44 | |
| Multimodal Understanding | POPE | POPE Score0.885 | 41 | |
| Visual Hallucination Evaluation | POPE MS-COCO Adversarial sampling (val) | Accuracy85.48 | 39 | |
| Hallucination Evaluation | POPE Adversarial v1.0 (test) | Accuracy88.96 | 31 | |
| Hallucination Evaluation | POPE Popular v1.0 (test) | Accuracy90.34 | 31 | |
| Hallucination Evaluation | POPE Random v1.0 (test) | Accuracy91.17 | 31 | |
| Object Hallucination Evaluation | POPE GQA Popular | Accuracy86.8 | 30 | |
| Hallucination Detection | POPE official (val) | A-ROC96.98 | 30 | |
| Transfer Attack | POPE | CAE27.48 | 30 | |
| Object Hallucination | POPE Adversarial v1.0 | Accuracy84.4 | 24 | |
| Object Hallucination | POPE Popular v1.0 | Accuracy88.03 | 24 | |
| Object Hallucination | POPE v1.0 (Random) | Accuracy90.07 | 24 | |
| VQA Hallucination Detection | POPE Average of Random, Popular, and Adversarial 2023 | Accuracy89.4 | 24 | |
| Object Hallucination | POPE average across COCO, A-OKVQA, GQA | ACC85.7 | 22 | |
| Object Hallucination Evaluation | POPE MSCOCO (val) | F1 Score88.1 | 21 |