Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MHumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination AssessmentMHumanEval
Response Rate72.6
20
Code GenerationmHumanEval
Pass@10.94
13
Object Hallucination EvaluationMHumanEval
Hallucination Rate (%)56
12
Multi-type Hallucination EvaluationMHumanEval
Object Hallucination Rate21.9
9
Showing 4 of 4 rows