Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMHal

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination EvaluationMMHal
Score4.2
37
VQA HallucinationMMHal
Score3.87
21
Pointwise ScoringMMHal pointwise
Kendall's Tau0.949
9
Hallucination EvaluationMMHal v1.0 (test)
Score2.23
6
Showing 4 of 4 rows