Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

POPE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Object Hallucination EvaluationPOPE
Accuracy94.42
935
Object HallucinationPOPE (Random)
F1 Score93.02
200
Object HallucinationPOPE Adversarial
Accuracy90
196
Object HallucinationPOPE Popular
F1 Score91.4
188
Hallucination EvaluationPOPE
Accuracy94.42
132
Object Hallucination EvaluationPOPE Adversarial offline
F1 Score68.96
84
Object Hallucination EvaluationPOPE Popular offline
F1 Score84.43
84
Object Hallucination EvaluationPOPE Random offline
F1 Score73.6
84
Visual Question AnsweringPOPE
Accuracy88.5
71
Transfer AttackPOPE (test)
CAE0.2477
69
Object Hallucination EvaluationPOPE (test)
Accuracy90.6
44
Multimodal UnderstandingPOPE
POPE Score0.885
41
Visual Hallucination EvaluationPOPE MS-COCO Adversarial sampling (val)
Accuracy85.48
39
Hallucination EvaluationPOPE Adversarial v1.0 (test)
Accuracy88.96
31
Hallucination EvaluationPOPE Popular v1.0 (test)
Accuracy90.34
31
Hallucination EvaluationPOPE Random v1.0 (test)
Accuracy91.17
31
Object Hallucination EvaluationPOPE GQA Popular
Accuracy86.8
30
Hallucination DetectionPOPE official (val)
A-ROC96.98
30
Transfer AttackPOPE
CAE27.48
30
Object HallucinationPOPE Adversarial v1.0
Accuracy84.4
24
Object HallucinationPOPE Popular v1.0
Accuracy88.03
24
Object HallucinationPOPE v1.0 (Random)
Accuracy90.07
24
VQA Hallucination DetectionPOPE Average of Random, Popular, and Adversarial 2023
Accuracy89.4
24
Object HallucinationPOPE average across COCO, A-OKVQA, GQA
ACC85.7
22
Object Hallucination EvaluationPOPE MSCOCO (val)
F1 Score88.1
21
Showing 25 of 65 rows