AIDE: AI-Driven Exploration in the Space of Code
About
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Autonomous Machine Learning Engineering | MLE-Bench Lite | Any Medal Rate45.45 | 57 | |
| ML Engineering | MLE-Bench official (test) | Medal Rate (Low)34.3 | 19 | |
| Combinatorial Optimization | Aircraft Landing (test) | Average Score82.28 | 17 | |
| Combinatorial Optimization | Overall (test) | Average Performance53.51 | 17 | |
| Combinatorial Optimization | Resource Constrained Shortest Path (test) | Average Score75.08 | 17 | |
| Combinatorial Optimization | Euclidean Steiner (test) | Average Performance63.37 | 15 | |
| Autonomous Machine Learning Engineering | MLE-bench (held-in and held-out) | CIFAR-10 Performance76.53 | 14 | |
| Automated Machine Learning | MLE-Bench | Valid Submission Rate82.8 | 14 | |
| Combinatorial Optimization | Periodic Vehicle Routing (test) | Average Value0.1058 | 14 | |
| Bus Scheduling | NYC Manhattan (in-domain) | Fuel Consumption257.6 | 13 |