| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MBPP | Accuracy98.4 | 116 | 2d ago | ||
| HumanEval | CompassMax-V3-Thinking | Pass@198.17 | 103 | 16d ago | |
| HumanEval+ | Pass@195.12 | 83 | 29d ago | ||
| MBPP+ | Pass@197.88 | 52 | 29d ago | ||
| Coding Tasks (test) | SALE | Pass@198.3 | 42 | 1mo ago | |
| HumanEval | Ministral-3-R | HumanEval Mean Score0.9695 | 32 | 1mo ago | |
| HumanEval, MBPP | ZIP | HumanEval Score20.73 | 30 | 4d ago | |
| MBPP | SwiR | Pass@1 Accuracy95.33 | 30 | 17d ago | |
| LiveCodeBench v5 | Qwen3-235B-A22B-R-TAP | Accuracy77.6 | 29 | 1mo ago | |
| Coding Suite EvalPlus & LiveCodeBench | Eval+ Score86.7 | 26 | 1mo ago | ||
| LiveCodeBench | Task Accuracy79 | 23 | 1mo ago | ||
| HumanEval | ICaRus | Accuracy86.6 | 22 | 1mo ago | |
| LiveCode | REAP | LiveCode Score41.2 | 22 | 1mo ago | |
| Eval+ | Eval+ Score81.4 | 22 | 1mo ago | ||
| HumanEval (test) | SPG w/ EUBO | Test Accuracy41.5 | 21 | 2d ago | |
| Coverage (test) | GPT4o | Precision94.57 | 21 | 1mo ago | |
| LiveCodeBench v6 | Pass@189 | 20 | 29d ago | ||
| MultiPL-E | LLaDA2.0-flash | Score74.87 | 20 | 1mo ago | |
| LiveCodeBench | LED | Pass@169.11 | 19 | 1mo ago | |
| HumanEval | SpecBundle | Throughput3,070 | 18 | 29d ago | |
| LiveCodeBench | SpecBundle | Throughput3,413 | 18 | 29d ago | |
| Coding (val) | GRPO | Pass@16100 | 16 | 1mo ago | |
| LiveCodeBench | Accuracy90.7 | 16 | 1mo ago | ||
| Codex-Eval | GPT-4 | Pass@1094.1 | 16 | 1mo ago | |
| LiveCodeBench | DeepSeek-R1-0528 | Pass@178.86 | 15 | 25d ago |