| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | CompassMax-V3-Thinking | Pass@198.17 | 52 | 4d ago | |
| Coding Tasks (test) | SALE | Pass@198.3 | 42 | 4d ago | |
| MBPP+ | Pass@186.21 | 37 | 4d ago | ||
| MBPP | Accuracy98.4 | 31 | 4d ago | ||
| HumanEval+ | Pass@195.12 | 31 | 4d ago | ||
| HumanEval | Ministral-3-R | HumanEval Mean Score0.9695 | 28 | 4d ago | |
| LiveCodeBench | Task Accuracy79 | 23 | 4d ago | ||
| Coverage (test) | GPT4o | Precision94.57 | 21 | 4d ago | |
| MultiPL-E | LLaDA2.0-flash | Score74.87 | 20 | 4d ago | |
| LiveCodeBench v5 | Accuracy75.9 | 18 | 4d ago | ||
| LiveCodeBench | Accuracy90.7 | 16 | 4d ago | ||
| Codex-Eval | GPT-4 | Pass@1094.1 | 16 | 4d ago | |
| LiveBench | RAM+ | Accuracy40.23 | 15 | 4d ago | |
| LiveCodeBench | LED | Pass@169.11 | 15 | 4d ago | |
| Code Benchmarks Aggregate | LLAMA 2 | Score37.5 | 12 | 4d ago | |
| Terminal-Bench 2.0 | Score59.3 | 11 | 4d ago | ||
| MBPP | CapFlow | Solve Rate83.28 | 11 | 4d ago | |
| HumanEval | CapFlow | Solve Rate0.9618 | 11 | 4d ago | |
| LiveCodeBench v6 | Pass@186.34 | 11 | 4d ago | ||
| MBPP | Ministral-3-R | Score94.16 | 11 | 4d ago | |
| Code Evaluation Suite HumanEval & MBPP | Qwen2.5-7B-Instruct | HumanEval Score81.7 | 10 | 4d ago | |
| BIRD-SQL | Score47.75 | 10 | 4d ago | ||
| BigCodeBench Full | LLaDA2.0-flash | Score41.58 | 10 | 4d ago | |
| CRUXEval-O | LLaDA2.1-flash | Score87.5 | 10 | 4d ago | |
| LiveCodeBench | Ling-flash-2.0 | Score52.48 | 10 | 4d ago |