Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning

About

As AI capabilities advance toward and potentially beyond human-level performance, a natural transition emerges where AI-driven development becomes more efficient than human-centric approaches. A promising pathway toward this transition lies in AI-for-AI (AI4AI), which leverages AI techniques to automate and optimize the design, training, and deployment of AI systems themselves. While LLM-based agents have shown the potential to realize AI4AI, they are often unable to fully leverage the experience accumulated by agents during the exploration of solutions in the reasoning process, leading to inefficiencies and suboptimal performance. To address this limitation, we propose ML-Master, a novel AI4AI agent that seamlessly integrates exploration and reasoning by employing a selectively scoped memory mechanism. This approach allows ML-Master to efficiently combine diverse insights from parallel solution trajectories with analytical reasoning, guiding further exploration without overwhelming the agent with excessive context. We evaluate ML-Master on the MLE-Bench, where it achieves a 29.3% average medal rate, significantly surpassing existing methods, particularly in medium-complexity tasks, while accomplishing this superior performance within a strict 12-hour time constraint-half the 24-hour limit used by previous baselines. These results demonstrate ML-Master's potential as a powerful tool for advancing AI4AI.

Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen• 2025

Related benchmarks

TaskDatasetResultRank
Autonomous Machine Learning EngineeringMLE-Bench Lite
Any Medal Rate55
57
Machine learning engineeringMLE-bench-30 (test)
Percentile Rank57.6
22
ML EngineeringMLE-Bench official (test)
Medal Rate (Low)51.5
19
Autonomous Machine Learning EngineeringMLE-bench (held-in and held-out)
CIFAR-10 Performance73.43
14
Automated Machine LearningMLE-Bench
Valid Submission Rate93.3
14
Machine learning engineeringMLE-Bench Lite
Any Medal (%)48.5
13
Automated AI ResearchMLE-Bench official (full)
Valid Submission Rate93.3
13
Machine learning engineeringMLE-Bench full official
Medal Rate (Low)51.5
11
AutoMLKompeteAI-Bench Contemporary part
Score10.4
8
Showing 9 of 9 rows

Other info

Follow for update