Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AIDE: AI-Driven Exploration in the Space of Code

About

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu• 2025

Related benchmarks

TaskDatasetResultRank
Autonomous Machine Learning EngineeringMLE-Bench Lite
Any Medal Rate45.45
49
ML EngineeringMLE-Bench official (test)
Medal Rate (Low)34.3
19
Automated Machine LearningMLE-Bench
Valid Submission Rate82.8
14
Automated AI ResearchMLE-Bench official (full)
Valid Submission Rate84.4
13
Machine learning engineeringMLE-Bench 51 tasks (held-out)
Avg@350.7
11
Machine learning engineeringMLE-Bench full official
Medal Rate (Low)34.3
11
Binary ClassificationInsult Detection
Competition Score91.35
7
RegressionNYC Taxi
Competition Score11.46
7
Tabular ClassificationTabular Dec. 2021
Competition Score0.962
7
Text NormalizationRuss. Text Norm.
Competition Score97.56
7
Showing 10 of 18 rows

Other info

Follow for update