| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | MGDebugger | Accuracy96.3 | 42 | 3mo ago | |
| MBPP | MGDebugger | Accuracy80.8 | 30 | 3mo ago | |
| InfiniteBench code_debug 40k input cap | Qwen3-4B | Accuracy34.26 | 19 | 7d ago | |
| DebugBench | MOCHA | Average Accuracy66.6 | 11 | 14d ago | |
| LiveCodeBench February-May 2025 (evaluation window) | MGDebugger | Accuracy36.6 | 4 | 3mo ago |