| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | MBPP (test) | Pass@195.1 | 298 | |
| Code Generation | MBPP+ | Pass@184.39 | 216 | |
| Code Generation | MBPP | Pass@189.1 | 193 | |
| Code Generation | MBPP | Pass@191.8 | 159 | |
| Code Generation | MBPP | Accuracy79.8 | 159 | |
| Code Generation | MBPP | Accuracy (%)92.2 | 146 | |
| Coding | MBPP | Accuracy98.4 | 116 | |
| Code Generation | MBPP+ | Accuracy75.9 | 104 | |
| Code Generation | MBPP-ET | Pass@191.8 | 91 | |
| Code Generation | MBPP | Accuracy96.6 | 90 | |
| Code Generating | MBPP | Pass@183.1 | 88 | |
| Code Generation | MBPP Plus (test) | Accuracy83.6 | 87 | |
| Code Generation | MBPP | Accuracy90.5 | 74 | |
| Code Generation | MBPP | Pass@1 Accuracy94.2 | 59 | |
| Function-level Code Generation | MBPP+ augmented (test) | Pass@179.6 | 56 | |
| Code Generation | MBPP | Tau Correlation9.94 | 55 | |
| Coding | MBPP+ | Pass@197.88 | 52 | |
| Code Generation | MBPP Sanitized | Accuracy85.7 | 51 | |
| Code | MBPP | Pass@177.9 | 49 | |
| Code Generation | MBPP+ | Score94.2 | 43 | |
| Code generation | MBPP | Pass@180.4 | 41 | |
| Code Generation | MBPP | Score58 | 38 | |
| Code Generation | MBPP | Accuracy68.8 | 36 | |
| Code Generation | MBPP | MBPP Score66.17 | 35 | |
| Code Reasoning | MBPP | MBPP Execution Accuracy84.7 | 33 |