| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | MBPP (test) | Pass@195.1 | 276 | |
| Code Generation | MBPP | Pass@187.6 | 175 | |
| Code Generation | MBPP | Accuracy (%)92.2 | 146 | |
| Code Generation | MBPP+ | Pass@183.6 | 122 | |
| Code Generation | MBPP | Accuracy79.8 | 120 | |
| Code Generation | MBPP | Pass@191.8 | 113 | |
| Code Generation | MBPP | Accuracy96.6 | 90 | |
| Code Generating | MBPP | Pass@183.1 | 88 | |
| Code Generation | MBPP Plus (test) | Accuracy83.6 | 87 | |
| Code Generation | MBPP-ET | Pass@191.8 | 75 | |
| Code Generation | MBPP+ | Accuracy75.9 | 75 | |
| Code Generation | MBPP Sanitized | Accuracy85.7 | 51 | |
| Function-level Code Generation | MBPP+ augmented (test) | Pass@179.6 | 45 | |
| Code | MBPP | Pass@177.9 | 43 | |
| Code Generation | MBPP+ | Score94.2 | 43 | |
| Code Generation | MBPP | Score58 | 38 | |
| Coding | MBPP+ | Pass@186.21 | 37 | |
| Code Generation | MBPP | MBPP Score66.17 | 35 | |
| Code Reasoning | MBPP | MBPP Execution Accuracy84.7 | 33 | |
| Code Completion | MBPP+ | Pass@165.6 | 33 | |
| Code Generation | MBPP v1 (test) | Pass@168.9 | 33 | |
| Code Generation | MBPP | Accuracy58 | 32 | |
| Code Verification | MBPP+ | Pass@176.93 | 32 | |
| Python Coding | MBPP standard (test) | Pass@1 Accuracy85.25 | 32 | |
| Coding | MBPP | Accuracy98.4 | 31 |