Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MHumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination AssessmentMHumanEval
Response Rate72.6
20
Code GenerationmHumanEval
Pass@10.94
13
Multi-type Hallucination EvaluationMHumanEval
Object Hallucination Rate21.9
9
Showing 3 of 3 rows