Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HallusionBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination EvaluationHallusionBench
Average Score93.1
108
Multimodal ReasoningHallusionBench
Accuracy0.7293
42
Hallucination AssessmentHallusionBench
Answer Accuracy (aAcc)71.6
39
Visual Hallucination EvaluationHallusionBench
Accuracy76.6
37
Hallucination and Visual Reasoning EvaluationHallusionBench
Score59.2
37
Hallucination RobustnessHallusionBench
Score57.8
32
Multimodal Hallucination EvaluationHallusionBench
Hallucination Score70.7
22
HallucinationHallusionBench
Pass@174
16
Visual PerceptionHallusionBench
Accuracy71.08
15
Visual ReasoningHallusionBench
Accuracy68.19
15
PerceptionHallusionBench
Score59.5
15
Hallucination EvaluationHallusionBench 2024
Score52.2
13
Visual Illusion and Hallucination EvaluationHallusionBench (HallB)
HallB Score41.7
13
Hallucination EvaluationHallusionBench GPT4-assisted (All)
Accuracy (All)49.94
11
Discriminative Hallucination DetectionHallusionBench
Accuracy73
10
Visual Hallucination EvaluationHallusionBench visual questions
Accuracy65.8
10
General VQAHallusionBench
Accuracy73.48
9
General VQAHallusionBench avg
Score67
7
Vision-Language ReasoningHallusionBench (test)
Simple Accuracy53.31
7
General visual question answeringHallusionBench
Pass@163.7
7
Hallucination controlHallusionBench
General Score60.5
6
Multimodal Hallucination AssessmentHallusionBench
Accuracy70
5
Hallucination AnalysisHallusionBench
fACC18.7
4
Hallucination EvaluationHallusionBench (test)
Question Pair Accuracy17.8
4
Visual Question AnsweringHallusionBench HBI (all)
Score45.21
4
Showing 25 of 27 rows