Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MHumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination AssessmentMHumanEval
Response Rate72.6
20
Code GenerationmHumanEval
Pass@10.94
13
Multi-type Hallucination EvaluationMHumanEval
Object Hallucination Rate21.9
9
Showing 3 of 3 rows