| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Reasoning | LiveCodeBench | Accuracy87.4 | 90 | |
| Code Generation | LiveCodeBench | Pass@190.7 | 89 | |
| Code Generation | LiveCodeBench | Pass@195.8 | 86 | |
| Code Generation | LiveCodeBench | Accuracy88.6 | 84 | |
| Code Generation | LiveCodeBench | Pass@188.1 | 76 | |
| Code Generation | LiveCodeBench v6 | Accuracy100 | 75 | |
| Code Generation | LiveCodeBench | Average Score168 | 68 | |
| Speculative Decoding | LiveCodeBench | Speedup Factor7.16 | 66 | |
| Code Generation | LiveCodeBench | Accuracy73.2 | 64 | |
| Predicting code correctness | LiveCodeBench Python | ECE0.015 | 60 | |
| Code correctness prediction | LiveCodeBench Python | AUROC86.7 | 60 | |
| Code Correctness Prediction | LiveCodeBench Python | Brier Score0.067 | 60 | |
| Code Generation | LiveCodeBench | Pass@11,784 | 51 | |
| Programming | LiveCodeBench V3 V4 (test) | Accuracy61.4 | 42 | |
| Code Generation | LiveCodeBench (test) | Pass@1 Overall53.6 | 42 | |
| Code Generation | LiveCodeBench v6 | Score91.7 | 41 | |
| Coding | LiveCodeBench | Accuracy70 | 38 | |
| Code | LiveCodeBench V5-6 | Accuracy50.8 | 33 | |
| Code | LiveCodeBench V1-4 | Accuracy47.1 | 33 | |
| Competitive Programming | LiveCodeBench Pro 25Q2 | Easy Score94.8 | 33 | |
| Competitive Programming | LiveCodeBench Pro 25Q1 | Easy Score96.6 | 33 | |
| Code Verification | LiveCodeBench | Pass@139.31 | 32 | |
| Code Generation | LiveCodeBench v6 (2025-02 to 2025-05) | Accuracy74.1 | 31 | |
| Coding | LiveCodeBench v6 | Score (%)75.1 | 31 | |
| Code Generation | LiveCodeBench v5 | Pass@161.5 | 30 |