Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MLE-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Autonomous Machine Learning EngineeringMLE-Bench Lite
Any Medal Rate81.82
57
Machine Learning EngineeringMLE-bench-30 (test)
Percentile Rank76
22
ML EngineeringMLE-Bench official (test)
Medal Rate (Low)71.2
19
Autonomous Machine Learning EngineeringMLE-bench (held-in and held-out)
CIFAR-10 Performance76.53
14
Automated Machine LearningMLE-Bench
Valid Submission Rate96.89
14
Machine Learning EngineeringMLE-Bench Lite
Any Medal (%)75.8
13
Automated AI ResearchMLE-Bench official (full)
Valid Submission Rate98.7
13
Machine Learning EngineeringMLE-Bench full official
Medal Rate (Low)68.2
11
Machine Learning EngineeringMLE-Bench 51 tasks (held-out)
Avg@358.5
11
Machine Learning EngineeringMLE-bench (held-out task instances)
Accuracy (%)58.6
6
Machine learning engineeringMLE-bench (All)
Medal Rate50.67
5
Machine learning engineeringMLE-bench Hard
Medal Rate40
5
Machine learning engineeringMLE-bench Medium
Medal Rate44.74
5
Machine learning engineeringMLE-bench Low
Medal Rate68.18
5
Fine-grained RecognitionMLE-Bench iNaturalist 2019 FGVC6
Score24.45
2
Medical Image ClassificationMLE-Bench RSNA Brain Tumor
Score0.6518
2
Fine-grained RecognitionMLE-Bench iMet 2020 FGVC7
Score68.04
2
Code UnderstandingMLE-Bench AI4Code
Score83.56
2
3D Object DetectionMLE-Bench 3D Object Detection
Score17.63
2
Scientific Data AnalysisMLE-bench AI4Science mix of seen and unseen
Stanford COVID Vaccine Score100
2
Showing 20 of 20 rows