| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | Accuracy94.42 | 1,455 | |
| Object Hallucination | POPE Adversarial | Accuracy90 | 288 | |
| Object Hallucination | POPE (Random) | F1 Score93.02 | 285 | |
| Object Hallucination | POPE Popular | F1 Score91.4 | 273 | |
| Hallucination Evaluation | POPE | Accuracy94.42 | 153 | |
| Visual Question Answering | POPE | Accuracy89.6 | 102 | |
| Multimodal Understanding | POPE | POPE Score0.893 | 90 | |
| Object Hallucination Evaluation | POPE Adversarial offline | F1 Score68.96 | 84 | |
| Object Hallucination Evaluation | POPE Popular offline | F1 Score84.43 | 84 | |
| Object Hallucination Evaluation | POPE Random offline | F1 Score73.6 | 84 | |
| Object Hallucination Evaluation | POPE (test) | Accuracy90.6 | 79 | |
| Object Hallucination Evaluation | POPE A-OKVQA | Accuracy89.23 | 75 | |
| Transfer Attack | POPE (test) | CAE0.2477 | 69 | |
| Object Hallucination Evaluation | POPE Adversarial | Accuracy85.89 | 55 | |
| Object Hallucination Evaluation | POPE MSCOCO | Accuracy92.58 | 55 | |
| Object Hallucination Evaluation | POPE Random, Popular, Adversarial v1.0 | Random Score94.27 | 51 | |
| Image Captioning | POPE Adversarial | CIDEr121.4 | 50 | |
| Object Hallucination Evaluation | POPE GQA Popular | Accuracy89.4 | 46 | |
| Visual Hallucination Evaluation | POPE MS-COCO Adversarial sampling (val) | Accuracy85.48 | 39 | |
| Hallucination Detection | POPE official (val) | A-PR99.13 | 34 | |
| Hallucination Evaluation | POPE Adversarial v1.0 (test) | Accuracy88.96 | 31 | |
| Hallucination Evaluation | POPE Popular v1.0 (test) | Accuracy90.34 | 31 | |
| Hallucination Evaluation | POPE Random v1.0 (test) | Accuracy91.17 | 31 | |
| Transfer Attack | POPE | CAE27.48 | 30 | |
| Object Hallucination Evaluation | POPE GQA (test) | Average Accuracy84.72 | 29 |