| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LiveCodeBench | Accuracy87.4 | 46 | 4d ago | ||
| HumanE | Denser | Accuracy84.9 | 35 | 4d ago | |
| HumanEval | DeepSeek-R1-Distill-Qwen-14B (Reasoning) | HumanEval Score95.73 | 35 | 2d ago | |
| MBPP | MBPP Execution Accuracy84.7 | 33 | 4d ago | ||
| CRUXEval | Qwen3-8B | Input-CoT Accuracy73.8 | 27 | 4d ago | |
| MBPP | Denser | Accuracy67.3 | 23 | 4d ago | |
| CRUXEval | Qwen2.5-Math-72B-Instruct | Accuracy68.6 | 21 | 4d ago | |
| LiveCodeBench 1.0 (test) | A3PO | Accuracy47.2 | 18 | 4d ago | |
| CRUX | RMoA | Accuracy87.37 | 16 | 4d ago | |
| CruxEval Output | DataFlow-Code-10K | Score51 | 12 | 4d ago | |
| CRUXEval-O | Kimi-K2 Base | Accuracy83.5 | 12 | 4d ago | |
| LCB | SCF-RKL | pass@162.46 | 8 | 4d ago | |
| CodeForces | LAD | Rating1,533.64 | 6 | 4d ago | |
| HumanEval+ | LAD | Average Score @1682.29 | 6 | 4d ago | |
| LiveCodeBench | LAD | Avg@1633.51 | 6 | 4d ago | |
| CRUX-O | HSA-UL | Accuracy40.75 | 6 | 4d ago | |
| MBPP base and extended (out-of-distribution) | InftyThink+ | Accuracy55.83 | 5 | 4d ago | |
| HumanEval base and extended (out-of-distribution) | InftyThink+ | Accuracy0.677 | 5 | 4d ago | |
| CRUXEval I | Kimi-K2 Base | Accuracy74 | 4 | 4d ago | |
| BigCodeBench | Qwen2.5-Coder-ScaleQuest | BigCodeBench Score40 | 3 | 4d ago |