| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Autonomous Machine Learning Engineering | MLE-Bench Lite | Any Medal Rate81.82 | 57 | |
| Machine Learning Engineering | MLE-bench-30 (test) | Percentile Rank76 | 22 | |
| ML Engineering | MLE-Bench official (test) | Medal Rate (Low)71.2 | 19 | |
| Autonomous Machine Learning Engineering | MLE-bench (held-in and held-out) | CIFAR-10 Performance76.53 | 14 | |
| Automated Machine Learning | MLE-Bench | Valid Submission Rate96.89 | 14 | |
| Machine Learning Engineering | MLE-Bench Lite | Any Medal (%)75.8 | 13 | |
| Automated AI Research | MLE-Bench official (full) | Valid Submission Rate98.7 | 13 | |
| Machine Learning Engineering | MLE-Bench full official | Medal Rate (Low)68.2 | 11 | |
| Machine Learning Engineering | MLE-Bench 51 tasks (held-out) | Avg@358.5 | 11 | |
| Machine Learning Engineering | MLE-bench (held-out task instances) | Accuracy (%)58.6 | 6 | |
| Machine learning engineering | MLE-bench (All) | Medal Rate50.67 | 5 | |
| Machine learning engineering | MLE-bench Hard | Medal Rate40 | 5 | |
| Machine learning engineering | MLE-bench Medium | Medal Rate44.74 | 5 | |
| Machine learning engineering | MLE-bench Low | Medal Rate68.18 | 5 | |
| Fine-grained Recognition | MLE-Bench iNaturalist 2019 FGVC6 | Score24.45 | 2 | |
| Medical Image Classification | MLE-Bench RSNA Brain Tumor | Score0.6518 | 2 | |
| Fine-grained Recognition | MLE-Bench iMet 2020 FGVC7 | Score68.04 | 2 | |
| Code Understanding | MLE-Bench AI4Code | Score83.56 | 2 | |
| 3D Object Detection | MLE-Bench 3D Object Detection | Score17.63 | 2 | |
| Scientific Data Analysis | MLE-bench AI4Science mix of seen and unseen | Stanford COVID Vaccine Score100 | 2 |