Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Machine Learning Engineering on MLE-bench (held-out task instances)

58.6Accuracy (%)

Full ExIt

2.02416.71231.446.088Sep 4, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.09
58.68.4
2025.09
57.310.1
2025.09
5311.9
2025.09
489.1
2025.09
47.89.4
2025.09
4.22.4