| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | LiveCodeBench | Pass@190.7 | 89 | |
| Code Generation | LiveCodeBench | Pass@195.8 | 86 | |
| Code Generation | LiveCodeBench | Average Score168 | 68 | |
| Code Reasoning | LiveCodeBench | Accuracy87.4 | 62 | |
| Code Generation | LiveCodeBench | Accuracy67 | 60 | |
| Code Generation | LiveCodeBench v6 | Accuracy100 | 58 | |
| Code Generation | LiveCodeBench | Pass@11,784 | 51 | |
| Code Generation | LiveCodeBench | Pass@188.1 | 48 | |
| Code Generation | LiveCodeBench v6 | Score91.7 | 41 | |
| Code Generation | LiveCodeBench (test) | Pass@1 Overall53.6 | 38 | |
| Code | LiveCodeBench V5-6 | Accuracy50.8 | 33 | |
| Code | LiveCodeBench V1-4 | Accuracy47.1 | 33 | |
| Competitive Programming | LiveCodeBench Pro 25Q2 | Easy Score94.8 | 33 | |
| Competitive Programming | LiveCodeBench Pro 25Q1 | Easy Score96.6 | 33 | |
| Code Verification | LiveCodeBench | Pass@139.31 | 32 | |
| Code Generation | LiveCodeBench | Accuracy79.5 | 30 | |
| Coding | LiveCodeBench v5 | Accuracy77.6 | 29 | |
| Reasoning | LiveCodeBench | LiveCodeBench Score54.25 | 27 | |
| Code Generation | LiveCodeBench v3 | Score90.2 | 26 | |
| Code Generation | LiveCodeBench | Speedup3.52 | 24 | |
| Code generation | LiveCodeBench Jan-Apr 2025 | Accuracy (pass@1)47.25 | 24 | |
| Code Generation | LiveCodeBench Medium | Accuracy96.79 | 23 | |
| Coding | LiveCodeBench | Task Accuracy79 | 23 | |
| Competitive Programming | LiveCodeBench v5 | Score82.8 | 22 | |
| Code Generation | LiveCodeBench (LCB) | FUNC Score73.1 | 21 |