| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | HumanEval Accuracy93.4 | 50 | 4d ago | ||
| MBPP | PrivCode | Pass@177.9 | 43 | 4d ago | |
| HumanEval+ | DARL | Accuracy73.2 | 22 | 4d ago | |
| MBPP 1,000-example (test) | Qwen3-VL-2B-Instruct | Perplexity9.0212 | 10 | 4d ago | |
| SWE Verified Agentless | pass@157.6 | 8 | 4d ago | ||
| LCB Pro Med 25Q2 | Nemotron-Cascade 14B-Thinking | pass@110.5 | 7 | 4d ago | |
| LCB Pro Easy 25Q2 | Nemotron-Cascade 14B-Thinking | Pass@168.9 | 7 | 4d ago | |
| LCB 08/24-02/25 v5 | Nemotron-Cascade 14B-Thinking | pass@177.5 | 7 | 4d ago | |
| CRUX | Accuracy66.4 | 6 | 4d ago | ||
| APT-Bench | Qwen3 | Accuracy41.9 | 6 | 4d ago | |
| MBPP+ | AdaRAS | Accuracy60.58 | 6 | 4d ago | |
| LCB v6 | Nemotron-Nano-v2 | Score60 | 6 | 4d ago | |
| EnConda-Bench | Youtu-LLM 2B | Accuracy0.215 | 4 | 4d ago | |
| CruxEval o | Engram-40B | Exact Match35.3 | 4 | 4d ago | |
| CruxEval-i | Engram-40B | Exact Match36.2 | 4 | 4d ago | |
| LiveCodeBench EN | Qwen3-8B | Score63.39 | 2 | 4d ago |