AIDE: AI-Driven Exploration in the Space of Code
About
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automated AI Research | MLE-Bench official (full) | Valid Submission Rate84.4 | 13 | |
| Machine learning engineering | MLE-Bench 51 tasks (held-out) | Avg@350.7 | 11 | |
| Binary Classification | Insult Detection | Competition Score91.35 | 7 | |
| Regression | NYC Taxi | Competition Score11.46 | 7 | |
| Tabular Classification | Tabular Dec. 2021 | Competition Score0.962 | 7 | |
| Text Normalization | Russ. Text Norm. | Competition Score97.56 | 7 | |
| Tabular Classification | Tabular May 2022 | Competition Score99.65 | 7 | |
| Binary Classification | Random Pizza | Competition Score0.682 | 7 | |
| Binary Classification | Toxic Jigsaw | Competition Score0.9807 | 7 | |
| Multi-class classification | Spooky Author | Competition Score0.2883 | 7 |