| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | MATRIX-Gen-SFT | Pass@17,927 | 1,036 | 11d ago | |
| HumanEval (test) | Pass@1100 | 506 | 4d ago | ||
| HumanEval+ | Pass@1100 | 383 | 8d ago | ||
| MBPP (test) | AgentConductor | Pass@195.1 | 298 | 3d ago | |
| Code | M2CL | Accuracy98.7 | 242 | 1mo ago | |
| MBPP+ | Pass@184.39 | 216 | 22d ago | ||
| MBPP | Pass@189.1 | 193 | 1mo ago | ||
| HumanEval | ExPairT-LLM | Pass@194.1 | 171 | 26d ago | |
| MBPP | AgentCoder | Pass@191.8 | 159 | 15d ago | |
| MBPP | GPT-4o | Accuracy79.8 | 159 | 1mo ago | |
| MBPP | MegaAgent | Accuracy (%)92.2 | 146 | 1mo ago | |
| HumanEval 1.0 (test) | Pass@185.4 | 145 | 1mo ago | ||
| MBPP+ | Planning | Accuracy75.9 | 104 | 18d ago | |
| HumanEval | Agent Q-Mix | Accuracy97.56 | 99 | 5d ago | |
| HumanEval | FinTool-Qwen3-14B | HumanEval Score94.51 | 93 | 2d ago | |
| HumanEval+ (test) | ACES-O | Pass@184.76 | 93 | 11d ago | |
| HumanEval-ET | ThinkCoder | Pass@189.6 | 92 | 1mo ago | |
| MBPP-ET | AgentCoder | Pass@191.8 | 91 | 1mo ago | |
| MBPP | EG-CFG | Accuracy96.6 | 90 | 1mo ago | |
| LiveCodeBench | Pass@190.7 | 89 | 1mo ago | ||
| MBPP Plus (test) | Team-of-Thoughts | Accuracy83.6 | 87 | 1mo ago | |
| LiveCodeBench | ExPairT-LLM | Pass@195.8 | 86 | 1mo ago | |
| HumanEval | Info-Gain | Accuracy (%)63.8 | 77 | 1mo ago | |
| MBPP | Accuracy90.5 | 74 | 26d ago | ||
| BigCodeBench | LatentMem | Accuracy83.84 | 71 | 1mo ago |