HumanEval, MBPP, and BigCodeBench

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Code Generation	HumanEval+, MBPP+, and BigCodeBench Aggregate	Average Score70.72		12

Showing 1 of 1 rows