| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | LiveCodeBench | Pass@190.7 | 89 | |
| Code Generation | LiveCodeBench | Pass@195.8 | 86 | |
| Code Generation | LiveCodeBench | Average Score168 | 68 | |
| Code Reasoning | LiveCodeBench | Accuracy87.4 | 46 | |
| Code Generation | LiveCodeBench (test) | Pass@1 Overall53.6 | 38 | |
| Code Generation | LiveCodeBench | Pass@11,784 | 37 | |
| Code Generation | LiveCodeBench | Accuracy57.29 | 32 | |
| Code Verification | LiveCodeBench | Pass@139.31 | 32 | |
| Reasoning | LiveCodeBench | LiveCodeBench Score54.25 | 27 | |
| Code Generation | LiveCodeBench v3 | Score90.2 | 26 | |
| Code Generation | LiveCodeBench | Speedup3.52 | 24 | |
| Code generation | LiveCodeBench Jan-Apr 2025 | Accuracy (pass@1)47.25 | 24 | |
| Code Generation | LiveCodeBench Medium | Accuracy96.79 | 23 | |
| Coding | LiveCodeBench | Task Accuracy79 | 23 | |
| Code Generation | LiveCodeBench v6 | Accuracy83.93 | 23 | |
| Competitive Programming | LiveCodeBench v5 | Score82.8 | 22 | |
| Code Generation | LiveCodeBench Hard | Pass@163.9 | 21 | |
| LLM Inference | LiveCodeBench | Speedup2.81 | 21 | |
| Competitive Programming | LiveCodeBench 2408 - 2505 v6 | Score80.2 | 19 | |
| Code Reasoning | LiveCodeBench 1.0 (test) | Accuracy47.2 | 18 | |
| Code Generation | LiveCodeBench pass@1 v5 | pass@130.7 | 18 | |
| Code Generation | LiveCodeBench lite v6 (test) | Accuracy56.57 | 18 | |
| Code Generation | LiveCodeBench lite v5 (test) | Accuracy64.07 | 18 | |
| Coding | LiveCodeBench v5 | Accuracy75.9 | 18 | |
| Code Generation | LiveCodeBench (LCB) | % Avg@432.4 | 17 |