Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval and MBPP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval and MBPP
Overall Average Score85.6
37
Code GenerationHumanEval and MBPP EvalPlus
HumanEval+ Pass@k70.1
29
Code-writingHumanEval & MBPP EvalPlus (test)
HumanEval Pass Rate39.02
4
Showing 3 of 3 rows