| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MLE-bench-30 (test) | AIRA2 | Percentile Rank76 | 22 | 2mo ago | |
| MLE-Bench Lite | EvoMaster | Any Medal (%)75.8 | 13 | 22d ago | |
| MLE-Dojo | Valid Sub.91.9 | 13 | 1mo ago | ||
| MLE-Bench full official | Gome (GPT-5) | Medal Rate (Low)68.2 | 11 | 3mo ago | |
| MLE-Bench 51 tasks (held-out) | MLE-IDEATOR | Avg@358.5 | 11 | 3mo ago | |
| MLE-bench (held-out task instances) | Full ExIt | Accuracy (%)58.6 | 6 | 3mo ago | |
| MLE-bench (All) | Leeroo | Medal Rate50.67 | 5 | 3mo ago | |
| MLE-bench Hard | Leeroo | Medal Rate40 | 5 | 3mo ago | |
| MLE-bench Medium | Leeroo | Medal Rate44.74 | 5 | 3mo ago | |
| MLE-bench Low | Leeroo | Medal Rate68.18 | 5 | 3mo ago |