| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@17,927 | 1,043 | |
| Code Generation | HumanEval (test) | Pass@1100 | 612 | |
| Code Generation | HumanEval+ | Pass@1100 | 393 | |
| Code Generation | HumanEval | Accuracy97.56 | 217 | |
| Code Generation | HumanEval | Pass@194.1 | 171 | |
| Coding | HumanEval | Pass@198.17 | 168 | |
| Coding | HumanEval+ | Pass@195.12 | 164 | |
| Code Generation | HumanEval | Speedup Factor8.22 | 147 | |
| Code Generation | HumanEval | pass@193.1 | 145 | |
| Code Generation | HumanEval 1.0 (test) | Pass@185.4 | 145 | |
| Code Generation | HumanEval | HumanEval Score95.22 | 128 | |
| Code | HumanEval | HumanEval Accuracy96.34 | 118 | |
| Code Generation | HumanEval | Accuracy98.27 | 115 | |
| Code Generation | HumanEval-ET | Pass@189.6 | 108 | |
| Inference Efficiency | HumanEval | Speedup Factor5.33 | 90 | |
| Code Generation | HumanEval | Accuracy (%)63.8 | 77 | |
| Code Generation | HumanEval+ | Pass Rate95.1 | 75 | |
| Code Generation | HumanEval 0-shot | Accuracy57.93 | 69 | |
| Code Generation | HumanEval | Acc98.27 | 65 | |
| Function-level Code Generation | HumanEval+ augmented (test) | Pass@190 | 65 | |
| Code Reasoning | HumanEval | HumanEval Score95.73 | 62 | |
| Code Generation | HumanEval+ | Pass@186 | 61 | |
| Code Generation | HumanEval | Tokens/s287.77 | 61 | |
| Coding | HumanEval | Accuracy95.62 | 60 | |
| Code Generation | HumanEval | Score91.56 | 55 |