Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMHal-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination EvaluationMMHal-Bench
MMHal Score4.7
306
Multimodal Hallucination EvaluationMMHal-Bench
Average Score4.84
129
Image+Text-to-Text Hallucination EvaluationMMHal-Bench
BERT Score79
18
Generative Hallucination MitigationMMHal-Bench
Overall Score3.49
13
Multi-modal Hallucination EvaluationMMHal-Bench v1.0 (test)
Overall Score2.14
12
Hallucination EvaluationMMHal-Bench-V
Hallucination Score2.57
9
Showing 6 of 6 rows