| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | MATRIX-Gen-SFT | Pass@17,927 | 1,043 | 18d ago | |
| HumanEval (test) | Pass@1100 | 612 | 23h ago | ||
| MBPP (test) | AgentConductor | Pass@195.1 | 405 | 23h ago | |
| HumanEval+ | Pass@1100 | 393 | 4d ago | ||
| Code | M2CL | Accuracy98.7 | 242 | 3mo ago | |
| MBPP+ | Pass@184.39 | 238 | 1d ago | ||
| MBPP+ | G-Memory | Accuracy85.75 | 236 | 11d ago | |
| HumanEval | Agent Q-Mix | Accuracy97.56 | 217 | 13d ago | |
| MBPP | Uno-Orchestra | Pass@192.4 | 211 | 12d ago | |
| MBPP | Pass@189.1 | 193 | 2mo ago | ||
| HumanEval | ExPairT-LLM | Pass@194.1 | 171 | 2mo ago | |
| MBPP | GPT-4o | Accuracy79.8 | 165 | 8d ago | |
| HumanEval | DFlash+DDTree | Speedup Factor8.22 | 147 | 4d ago | |
| MBPP | MegaAgent | Accuracy (%)92.2 | 146 | 3mo ago | |
| HumanEval | Uno-Orchestra | pass@193.1 | 145 | 18h ago | |
| HumanEval 1.0 (test) | Pass@185.4 | 145 | 3mo ago | ||
| HumanEval+ (test) | SKETCHVERIFY (B: sem. vote) | Pass@198.1 | 132 | 5d ago | |
| HumanEval | SIGMA | HumanEval Score95.22 | 128 | 20h ago | |
| HumanEval | SASFT | Accuracy98.27 | 115 | 1d ago | |
| EvalPlus | EVOLVECODER-4B (r3) | Pass@189 | 115 | 23h ago | |
| APPS | ExPairT-LLM | Pass@191.2 | 111 | 16d ago | |
| HumanEval-ET | ThinkCoder | Pass@189.6 | 108 | 25d ago | |
| MBPP-ET | AgentCoder | Pass@191.8 | 91 | 3mo ago | |
| MBPP | EG-CFG | Accuracy96.6 | 90 | 3mo ago | |
| MBPP | Accuracy90.5 | 89 | 14d ago |