Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HallBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination EvaluationHallBench
Accuracy73.6
31
Vision-Language Hallucination EvaluationHallBench
Accuracy64.2
15
Multimodal Hallucination EvaluationHallBench
Score65.2
9
Hallucination EvaluationHallBench avg
Hallucination Score58.1
7
Showing 4 of 4 rows