| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | CodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (Held-out) | Average Accuracy79.2 | 18 | |
| Code Generation | CodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (2nd Pass) | Average Accuracy64.6 | 18 | |
| Code Generation | CodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (1st Pass) | Average Accuracy66.7 | 18 |