| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | HumanEval Accuracy96.34 | 118 | 19h ago | ||
| MBPP | FLARE-9B | Pass@191.05 | 73 | 19h ago | |
| HumanEval+ | SUN | Accuracy79.9 | 43 | 7d ago | |
| LiveCodeBench V5-6 | Reasoning Memory | Accuracy50.8 | 33 | 2mo ago | |
| LiveCodeBench V1-4 | Reasoning Memory | Accuracy47.1 | 33 | 2mo ago | |
| LiveCode | Lowest Centroid | Accuracy83.6 | 30 | 1mo ago | |
| CRUX | CoReward | Accuracy @555.08 | 27 | 1mo ago | |
| LiC Code | DeepSeek-R1 | Concat Score97.1 | 21 | 1mo ago | |
| LCB v6 | Score87.7 | 20 | 19h ago | ||
| HumanEval (test) | LightMoE | HumanEval Success Rate58.1 | 17 | 14d ago | |
| MBPP | AIPO | Average Performance67.19 | 16 | 6d ago | |
| LCB | BASTION | Speedup8.37 | 12 | 4d ago | |
| MBPP | BASTION | Speedup7.68 | 12 | 4d ago | |
| LiveCodeBench v5 | Qwen3-14B + NGM | Score29.94 | 10 | 14d ago | |
| MBPP 1,000-example (test) | Qwen3-VL-2B-Instruct | Perplexity9.0212 | 10 | 3mo ago | |
| LiveCodeBench v6 | Score50.86 | 9 | 19h ago | ||
| LiveCodeBench v6 | Pass@167.3 | 9 | 19h ago | ||
| MBPP | Score93.77 | 9 | 19h ago | ||
| CRUX-I | DSR | Score39.75 | 9 | 7d ago | |
| LiveCodeBench | DSR | Score13.88 | 9 | 7d ago | |
| BigCodeBench | Cosine-decay | Score40 | 9 | 7d ago | |
| HumanEval pass@1 | Pass@167.07 | 9 | 3mo ago | ||
| HumanEval | PromptCOS | True WS Score1 | 8 | 8d ago | |
| SWE Verified Agentless | pass@157.6 | 8 | 3mo ago | ||
| LCB Pro Med 25Q2 | Nemotron-Cascade 14B-Thinking | pass@110.5 | 7 | 3mo ago |