Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A-OKVQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual Question AnsweringA-OKVQA
Acc92.68
202
Visual Question AnsweringA-OKVQA (val)
Accuracy0.879
88
Visual Question AnsweringA-OKVQA (test)
Accuracy89.17
88
Object Hallucination EvaluationA-OKVQA POPE (Popular)
Accuracy90.3
52
Multi-choice Visual Question AnsweringA-OKVQA
Accuracy82.71
49
VLM EditingA-OKVQA 2022 (test)
Accuracy100
48
Object Hallucination EvaluationA-OKVQA POPE (Random)
Accuracy89.5
36
Object HallucinationA-OKVQA POPE (test)
Accuracy (Random)90.13
29
Visual Question Answering (Multi-choice)A-OKVQA (test)
Accuracy87.2
28
Object Hallucination ProbingA-OKVQA (Adversarial split)
Accuracy79.1
27
Direct Answer Visual Question AnsweringA-OKVQA (test)
Accuracy69
22
Object Hallucination EvaluationA-OKVQA POPE
Random Accuracy92.37
21
Object Hallucination AssessmentA-OKVQA POPE (Adversarial)
Accuracy0.8126
18
Direct-answer Visual Question AnsweringA-OKVQA
Accuracy68.7
18
Visual Question AnsweringA-OKVQA POPE Evaluation (Adversarial)
Accuracy82
16
Visual Question AnsweringA-OKVQA POPE (Popular)
Accuracy89.77
16
Visual Question AnsweringA-OKVQA POPE Evaluation (Random)
Accuracy90.03
16
Hallucination EvaluationA-OKVQA
Accuracy (Random)93.76
15
Visual Question AnsweringA-OKVQA Open-Ended
Accuracy72.14
15
Visual Question AnsweringA-OKVQA v1.0 (test)
Accuracy53.36
14
Object Hallucination ProbingA-OKVQA (Random split)
Accuracy90.83
12
Direct-AnswerA-OKVQA 1.0 (test)
Accuracy68
12
Polling-based Object Probing Evaluation (POPE)A-OKVQA POPE (Adversarial)
Accuracy81.94
12
Polling-based Object Probing Evaluation (POPE)A-OKVQA POPE Popular
Accuracy0.8813
12
Polling-based Object Probing Evaluation (POPE)A-OKVQA POPE Random
Accuracy89.6
12
Showing 25 of 40 rows