Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval and MBPP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval and MBPP
HumanEval Score95.1
59
Code GenerationHumanEval and MBPP EvalPlus
HumanEval+ Pass@k70.1
29
Code GenerationHumanEval+ and MBPP+
Score73.7
4
Code-writingHumanEval & MBPP EvalPlus (test)
HumanEval Pass Rate39.02
4
Showing 4 of 4 rows