Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

POPE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Object Hallucination EvaluationPOPE
Accuracy94.42
2,019
Object HallucinationPOPE Popular
F1 Score93.01
372
Object HallucinationPOPE Adversarial
Accuracy90
353
Object HallucinationPOPE (Random)
F1 Score93.02
324
Hallucination EvaluationPOPE
Accuracy94.42
217
Object Hallucination EvaluationPOPE Adversarial
Accuracy89.33
159
Object Hallucination EvaluationPOPE Random
Accuracy94
152
Multimodal UnderstandingPOPE
POPE Score0.906
112
Visual Question AnsweringPOPE
Accuracy89.6
110
Object Hallucination EvaluationPOPE (test)
Accuracy90.6
107
Object Hallucination EvaluationPOPE (popular)
Accuracy92
96
Object Hallucination EvaluationPOPE Adversarial offline
F1 Score68.96
84
Object Hallucination EvaluationPOPE Popular offline
F1 Score84.43
84
Object Hallucination EvaluationPOPE Random offline
F1 Score73.6
84
Object Hallucination EvaluationPOPE A-OKVQA
Accuracy89.23
75
Object Hallucination EvaluationPOPE GQA Popular
Accuracy89.4
70
Transfer AttackPOPE (test)
CAE0.2477
69
Object Hallucination EvaluationPOPE MSCOCO
F1 Score93.97
60
Object ProbingPOPE Average
Accuracy87.84
52
Object HallucinationPOPE
Accuracy90.51
51
Object Hallucination EvaluationPOPE Random, Popular, Adversarial v1.0
Random Score94.27
51
Image CaptioningPOPE Adversarial
CIDEr121.4
50
Visual Question Answering for object probingPOPE Aggregated random, popular, and adversarial
Accuracy (POPE Aggregated)86.53
47
Object HallucinationPOPE Adversarial v1.0
Accuracy89.26
45
Discriminative Object HallucinationPOPE MSCOCO Adversarial
Accuracy87.33
43
Showing 25 of 129 rows