| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Output Prediction | CRUXEval-O | Pass@149.8 | 47 | |
| Code Input Prediction | CRUXEval-I | Pass@150 | 47 | |
| Code Reasoning | CRUXEval | Input-CoT Accuracy73.8 | 27 | |
| Code Reasoning | CRUXEval | Accuracy68.6 | 21 | |
| Code Reasoning | CruxEval Output | Score51 | 12 | |
| Code Reasoning | CRUXEval-O | Accuracy83.5 | 12 | |
| Coding | CRUXEval-O | Score87.5 | 10 | |
| Coding | CRUXEval | Pass@155.9 | 6 | |
| Code Reasoning | CRUXEval I | Accuracy74 | 4 | |
| Code | CruxEval o | Exact Match35.3 | 4 | |
| Code | CruxEval-i | Exact Match36.2 | 4 | |
| Code Reasoning (Output Prediction) | CRUXEval-O 1-shot | Accuracy84.01 | 3 | |
| Code Reasoning (Input Prediction) | CRUXEval-I 1-shot | Accuracy79.75 | 3 |