AIDE: AI-Driven Exploration in the Space of Code

About

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu• 2025

Related benchmarks

Task	Dataset	Result
Autonomous Machine Learning Engineering	MLE-Bench Lite	Any Medal Rate45.45	57
ML Engineering	MLE-Bench official (test)	Medal Rate (Low)34.3	19
Combinatorial Optimization	Aircraft Landing (test)	Average Score82.28	17
Combinatorial Optimization	Overall (test)	Average Performance53.51	17
Combinatorial Optimization	Resource Constrained Shortest Path (test)	Average Score75.08	17
Combinatorial Optimization	Euclidean Steiner (test)	Average Performance63.37	15
Autonomous Machine Learning Engineering	MLE-bench (held-in and held-out)	CIFAR-10 Performance76.53	14
Automated Machine Learning	MLE-Bench	Valid Submission Rate82.8	14
Combinatorial Optimization	Periodic Vehicle Routing (test)	Average Value0.1058	14
Bus Scheduling	NYC Manhattan (in-domain)	Fuel Consumption257.6	13

Showing 10 of 43 rows

Other info

Follow for update

@wizwand_team Discord