| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | MATRIX-Gen-SFT | Pass@17,927 | 850 | 2d ago | |
| HumanEval (test) | Pass@1100 | 444 | 4d ago | ||
| MBPP (test) | AgentConductor | Pass@195.1 | 276 | 4d ago | |
| Code | M2CL | Accuracy98.7 | 242 | 4d ago | |
| HumanEval+ | Pass@192.7 | 189 | 4d ago | ||
| MBPP | NBDiff-7B-INSTRUCT | Pass@187.6 | 175 | 2d ago | |
| MBPP | MegaAgent | Accuracy (%)92.2 | 146 | 4d ago | |
| HumanEval 1.0 (test) | Pass@185.4 | 145 | 4d ago | ||
| MBPP+ | Pass@183.6 | 122 | 4d ago | ||
| MBPP | GPT-4o | Accuracy79.8 | 120 | 4d ago | |
| MBPP | AgentCoder | Pass@191.8 | 113 | 4d ago | |
| HumanEval | ExPairT-LLM | Pass@194.1 | 108 | 2d ago | |
| MBPP | EG-CFG | Accuracy96.6 | 90 | 4d ago | |
| LiveCodeBench | Pass@190.7 | 89 | 4d ago | ||
| MBPP Plus (test) | Team-of-Thoughts | Accuracy83.6 | 87 | 4d ago | |
| LiveCodeBench | ExPairT-LLM | Pass@195.8 | 86 | 2d ago | |
| HumanEval+ (test) | Pass@181.7 | 81 | 4d ago | ||
| HumanEval | Info-Gain | Accuracy (%)63.8 | 77 | 4d ago | |
| MBPP-ET | AgentCoder | Pass@191.8 | 75 | 4d ago | |
| HumanEval-ET | ThinkCoder | Pass@189.6 | 75 | 4d ago | |
| MBPP+ | Planning | Accuracy75.9 | 75 | 4d ago | |
| APPS | ExPairT-LLM | Pass@191.2 | 69 | 4d ago | |
| LiveCodeBench | ExpertWeaver | Average Score168 | 68 | 2d ago | |
| HumanEval | PRISM | Tokens/s287.77 | 61 | 4d ago | |
| BigCodeBench | LatentMem | Accuracy83.84 | 59 | 4d ago |