Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

POPE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Object Hallucination EvaluationPOPE
Accuracy94.42
1,455
Object HallucinationPOPE Adversarial
Accuracy90
288
Object HallucinationPOPE (Random)
F1 Score93.02
285
Object HallucinationPOPE Popular
F1 Score91.4
273
Hallucination EvaluationPOPE
Accuracy94.42
153
Visual Question AnsweringPOPE
Accuracy89.6
102
Multimodal UnderstandingPOPE
POPE Score0.893
90
Object Hallucination EvaluationPOPE Adversarial offline
F1 Score68.96
84
Object Hallucination EvaluationPOPE Popular offline
F1 Score84.43
84
Object Hallucination EvaluationPOPE Random offline
F1 Score73.6
84
Object Hallucination EvaluationPOPE (test)
Accuracy90.6
79
Object Hallucination EvaluationPOPE A-OKVQA
Accuracy89.23
75
Transfer AttackPOPE (test)
CAE0.2477
69
Object Hallucination EvaluationPOPE Adversarial
Accuracy85.89
55
Object Hallucination EvaluationPOPE MSCOCO
Accuracy92.58
55
Object Hallucination EvaluationPOPE Random, Popular, Adversarial v1.0
Random Score94.27
51
Image CaptioningPOPE Adversarial
CIDEr121.4
50
Object Hallucination EvaluationPOPE GQA Popular
Accuracy89.4
46
Visual Hallucination EvaluationPOPE MS-COCO Adversarial sampling (val)
Accuracy85.48
39
Hallucination DetectionPOPE official (val)
A-PR99.13
34
Hallucination EvaluationPOPE Adversarial v1.0 (test)
Accuracy88.96
31
Hallucination EvaluationPOPE Popular v1.0 (test)
Accuracy90.34
31
Hallucination EvaluationPOPE Random v1.0 (test)
Accuracy91.17
31
Transfer AttackPOPE
CAE27.48
30
Object Hallucination EvaluationPOPE GQA (test)
Average Accuracy84.72
29
Showing 25 of 90 rows