Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AIDE: AI-Driven Exploration in the Space of Code

About

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu• 2025

Related benchmarks

TaskDatasetResultRank
Automated AI ResearchMLE-Bench official (full)
Valid Submission Rate84.4
13
Machine learning engineeringMLE-Bench 51 tasks (held-out)
Avg@350.7
11
Binary ClassificationInsult Detection
Competition Score91.35
7
RegressionNYC Taxi
Competition Score11.46
7
Tabular ClassificationTabular Dec. 2021
Competition Score0.962
7
Text NormalizationRuss. Text Norm.
Competition Score97.56
7
Tabular ClassificationTabular May 2022
Competition Score99.65
7
Binary ClassificationRandom Pizza
Competition Score0.682
7
Binary ClassificationToxic Jigsaw
Competition Score0.9807
7
Multi-class classificationSpooky Author
Competition Score0.2883
7
Showing 10 of 13 rows

Other info

Follow for update