| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | CodeContests | Pass@189.09 | 68 | |
| Code Generation | CodeContests (test) | Pass@11,200 | 68 | |
| Code Generation | CodeContests | Accuracy21.2 | 30 | |
| Code Generation | CodeContests | Avg@839.33 | 26 | |
| Code Generation | CodeContests official (val) | Pass@443.6 | 24 | |
| Code Generation | CodeContests | Signal38 | 21 | |
| Code Generation | CodeContests | Pass@155.2 | 21 | |
| Code Generation | CodeContests+ | LCBv6 Score39.1 | 15 | |
| Code Generation | CodeContests | Accuracy (CC)26.7 | 15 | |
| Efficiency Test Generation | CodeContests C++ | ASR (Fast)62.22 | 8 | |
| Efficiency Test Generation | CodeContests Java | Acceptance Success Rate (Fast)63.08 | 8 | |
| Efficiency Test Generation | CodeContests Python | ASR (Fast)60 | 8 | |
| Efficiency-oriented test case generation | CodeContests | ASR (Mean)75.82 | 8 | |
| Code Generation | CodeContests (evaluation set) | Pass@119.7 | 8 | |
| Competitive Programming | CodeContests (val) | Pass@168.86 | 6 | |
| Program Synthesis | CodeContests (test) | Pass@10.2045 | 6 | |
| Coding Reasoning | Codecontests | Pass Rate65.8 | 5 | |
| Multi-agent Selection (Pairwise Resolution) | CodeContests (test) | Pairwise Resolution89.4 | 3 | |
| Code Generation | CodeContests transfer | Mean F1 Score38.24 | 3 | |
| Competition-Level Code Generation | CodeContests (val) | 10@1k21 | 3 |