HumanEvalFix

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Code Repair	HumanEvalFix (test)	Success Rate (Python)59.1		19
Bug-fixing	HumanEvalFix held-out 25% (eval)	Utility Difference (Δπ)5.2		3

Showing 2 of 2 rows