| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MBPP sanitized | KLASS | Accuracy64.59 | 12 | 1mo ago | |
| Ag-LiveCodeBench-X 5.0 (derived) | Llama 3.3 70B Ins | OCaml Pass@17 | 8 | 1mo ago | |
| MultiPL-E | Qwen3-8B-CF-X | Success Rate (Lua)68 | 5 | 1mo ago | |
| HumanEval | Davinci Codex | pass@136 | 5 | 1mo ago | |
| MBPP | Davinci Codex | Pass@150.4 | 4 | 1mo ago |