| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | Accuracy94.42 | 2,019 | |
| Object Hallucination | POPE Popular | F1 Score93.01 | 372 | |
| Object Hallucination | POPE Adversarial | Accuracy90 | 353 | |
| Object Hallucination | POPE (Random) | F1 Score93.02 | 324 | |
| Hallucination Evaluation | POPE | Accuracy94.42 | 217 | |
| Object Hallucination Evaluation | POPE Adversarial | Accuracy89.33 | 159 | |
| Object Hallucination Evaluation | POPE Random | Accuracy94 | 152 | |
| Multimodal Understanding | POPE | POPE Score0.906 | 112 | |
| Visual Question Answering | POPE | Accuracy89.6 | 110 | |
| Object Hallucination Evaluation | POPE (test) | Accuracy90.6 | 107 | |
| Object Hallucination Evaluation | POPE (popular) | Accuracy92 | 96 | |
| Object Hallucination Evaluation | POPE Adversarial offline | F1 Score68.96 | 84 | |
| Object Hallucination Evaluation | POPE Popular offline | F1 Score84.43 | 84 | |
| Object Hallucination Evaluation | POPE Random offline | F1 Score73.6 | 84 | |
| Object Hallucination Evaluation | POPE A-OKVQA | Accuracy89.23 | 75 | |
| Object Hallucination Evaluation | POPE GQA Popular | Accuracy89.4 | 70 | |
| Transfer Attack | POPE (test) | CAE0.2477 | 69 | |
| Object Hallucination Evaluation | POPE MSCOCO | F1 Score93.97 | 60 | |
| Object Probing | POPE Average | Accuracy87.84 | 52 | |
| Object Hallucination | POPE | Accuracy90.51 | 51 | |
| Object Hallucination Evaluation | POPE Random, Popular, Adversarial v1.0 | Random Score94.27 | 51 | |
| Image Captioning | POPE Adversarial | CIDEr121.4 | 50 | |
| Visual Question Answering for object probing | POPE Aggregated random, popular, and adversarial | Accuracy (POPE Aggregated)86.53 | 47 | |
| Object Hallucination | POPE Adversarial v1.0 | Accuracy89.26 | 45 | |
| Discriminative Object Hallucination | POPE MSCOCO Adversarial | Accuracy87.33 | 43 |