ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning

About

As AI capabilities advance toward and potentially beyond human-level performance, a natural transition emerges where AI-driven development becomes more efficient than human-centric approaches. A promising pathway toward this transition lies in AI-for-AI (AI4AI), which leverages AI techniques to automate and optimize the design, training, and deployment of AI systems themselves. While LLM-based agents have shown the potential to realize AI4AI, they are often unable to fully leverage the experience accumulated by agents during the exploration of solutions in the reasoning process, leading to inefficiencies and suboptimal performance. To address this limitation, we propose ML-Master, a novel AI4AI agent that seamlessly integrates exploration and reasoning by employing a selectively scoped memory mechanism. This approach allows ML-Master to efficiently combine diverse insights from parallel solution trajectories with analytical reasoning, guiding further exploration without overwhelming the agent with excessive context. We evaluate ML-Master on the MLE-Bench, where it achieves a 29.3% average medal rate, significantly surpassing existing methods, particularly in medium-complexity tasks, while accomplishing this superior performance within a strict 12-hour time constraint-half the 24-hour limit used by previous baselines. These results demonstrate ML-Master's potential as a powerful tool for advancing AI4AI.

Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen• 2025

Related benchmarks

Task	Dataset	Result
Autonomous Machine Learning Engineering	MLE-Bench Lite	Any Medal Rate55	57
Machine learning engineering	MLE-Bench Lite	Any Medal (%)48.5	28
Machine learning engineering	MLE-Bench full official	Medal Rate (Low)51.5	23
Machine learning engineering	MLE-bench-30 (test)	Percentile Rank57.6	22
ML Engineering	MLE-Bench official (test)	Medal Rate (Low)51.5	19
Autonomous Machine Learning Engineering	MLE-bench (held-in and held-out)	CIFAR-10 Performance73.43	14
Automated Machine Learning	MLE-Bench	Valid Submission Rate93.3	14
Automated AI Research	MLE-Bench official (full)	Valid Submission Rate93.3	13
Machine learning engineering	MLE-Bench Lite 22-competition June 2026	Medal %75.8	10
AutoML	KompeteAI-Bench Contemporary part	Score10.4	8

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord