R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science
About
Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. Although crowd-sourcing platforms alleviate some challenges, high-level machine learning engineering (MLE) tasks remain labor-intensive and iterative. We introduce R&D-Agent, a comprehensive, decoupled, and extensible framework that formalizes the MLE process. R&D-Agent defines the MLE workflow into two phases and six components, turning agent design for MLE from ad-hoc craftsmanship into a principled, testable process. Although several existing agents report promising gains on their chosen components, they can mostly be summarized as a partial optimization from our framework's simple baseline. Inspired by human experts, we designed efficient and effective agents within this framework that achieve state-of-the-art performance. Evaluated on MLE-Bench, the agent built on R&D-Agent ranks as the top-performing machine learning engineering agent, achieving 35.1% any medal rate, demonstrating the ability of the framework to speed up innovation and improve accuracy across a wide range of data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Autonomous Machine Learning Engineering | MLE-Bench Lite | Any Medal Rate62.12 | 49 | |
| Automated Machine Learning | MLE-Bench | Valid Submission Rate53.33 | 14 | |
| Automated AI Research | MLE-Bench official (full) | Valid Submission Rate53.3 | 13 | |
| Machine learning engineering | MLE-bench Low | Medal Rate68.18 | 5 | |
| Machine learning engineering | MLE-bench (All) | Medal Rate35.11 | 5 | |
| Machine learning engineering | MLE-bench Medium | Medal Rate21.05 | 5 | |
| Machine learning engineering | MLE-bench Hard | Medal Rate22.22 | 5 | |
| Virtual Cell Modeling | 20 virtual cell modeling trials | Preprocess Error45 | 3 |