The FM Agent

About

Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-agent framework that leverages a synergistic combination of LLM-based reasoning and large-scale evolutionary search to address complex real-world challenges. The core of FM Agent integrates several key innovations: 1) a cold-start initialization phase incorporating expert guidance, 2) a novel evolutionary sampling strategy for iterative optimization, 3) domain-specific evaluators that combine correctness, effectiveness, and LLM-supervised feedback, and 4) a distributed, asynchronous execution infrastructure built on Ray. Demonstrating broad applicability, our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems. FM Agent reaches state-of-the-art results autonomously, without human interpretation or tuning -- 1976.3 on ALE-Bench (+5.2\%), 43.56\% on MLE-Bench (+4.0pp), up to 20x speedups on KernelBench, and establishes new state-of-the-art(SOTA) results on several classical mathematical problems. Beyond academic benchmarks, FM Agent shows considerable promise for both large-scale enterprise R\&D workflows and fundamental scientific research, where it can accelerate innovation, automate complex discovery processes, and deliver substantial engineering and scientific advances with broader societal impact.

Annan Li, Chufan Wu, Zengle Ge, Yee Hin Chong, Zhinan Hou, Lizhe Cao, Cheng Ju, Jianmin Wu, Huaiming Li, Haobo Zhang, Shenghao Feng, Mo Zhao, Fengzhi Qiu, Rui Yang, Mengmeng Zhang, Wenyi Zhu, Yingying Sun, Quan Sun, Shunhao Yan, Danyu Liu, Dawei Yin, Dou Shen• 2025

Related benchmarks

Task	Dataset	Result
Autonomous Machine Learning Engineering	MLE-Bench Lite	Any Medal Rate62.1	57
Machine learning engineering	MLE-Bench Lite	Any Medal (%)75.76	28
Machine learning engineering	MLE-Bench full official	Medal Rate (Low)62.1	23
Machine learning engineering	MLE-bench-30 (test)	Percentile Rank69.6	22
ML Engineering	MLE-Bench official (test)	Medal Rate (Low)62.1	19
Automated Machine Learning	MLE-Bench	Valid Submission Rate96.89	14
Automated AI Research	MLE-Bench official (full)	Valid Submission Rate96.9	13
Circle packing	26-Circle Packing in unit square (test)	Performance Score2.636	3
Mathematical Optimization	An uncertainty inequality (test)	Performance Score0.3521	3
Ratio Minimization	Ratio minimization problem (test)	Performance Score12.8892	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord