| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@17,927 | 1,036 | |
| Code Generation | HumanEval (test) | Pass@1100 | 506 | |
| Code Generation | HumanEval+ | Pass@1100 | 383 | |
| Code Generation | HumanEval | Pass@194.1 | 171 | |
| Code Generation | HumanEval 1.0 (test) | Pass@185.4 | 145 | |
| Coding | HumanEval | Pass@198.17 | 103 | |
| Code Generation | HumanEval | Accuracy97.56 | 99 | |
| Code Generation | HumanEval | HumanEval Score94.51 | 93 | |
| Code Generation | HumanEval-ET | Pass@189.6 | 92 | |
| Coding | HumanEval+ | Pass@195.12 | 83 | |
| Code | HumanEval | HumanEval Accuracy95.1 | 79 | |
| Code Generation | HumanEval | Accuracy (%)63.8 | 77 | |
| Code Generation | HumanEval | Acc98.27 | 65 | |
| Code Generation | HumanEval | Tokens/s287.77 | 61 | |
| Function-level Code Generation | HumanEval+ augmented (test) | Pass@190 | 57 | |
| Code Generation | HumanEval+ | Pass Rate95.1 | 56 | |
| Code Generation | HumanEval | Tau10.72 | 55 | |
| Inference Efficiency | HumanEval | Speedup Factor5.15 | 54 | |
| Code Generation | HumanEval Multilingual (test) | Average Score76.5 | 52 | |
| Code Generation | HumanEval | Accuracy81.7 | 51 | |
| Code Generation | HumanEval | HumanEval Score93 | 50 | |
| Code Generation | HumanEval | Average Tau (τ)1.94 | 45 | |
| Code Generation | HumanEval @WizardCoder (test) | Pass@171.95 | 45 | |
| Code generation | HumanEval | Success Rate (SR)6.06 | 43 | |
| Code Debugging | HumanEval | Accuracy96.3 | 42 |