Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

About

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. CORAL replaces rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous multi-agent execution, and heartbeat-based interventions. It also provides practical safeguards, including isolated workspaces, evaluator separation, resource management, and agent session and health management. Evaluated on diverse mathematical, algorithmic, and systems optimization tasks, CORAL sets new state-of-the-art results on 10 tasks, achieving 3-10 times higher improvement rates with far fewer evaluations than fixed evolutionary search baselines across tasks. On Anthropic's kernel engineering task, four co-evolving agents improve the best known score from 1363 to 1103 cycles. Mechanistic analyses further show how these gains arise from knowledge reuse and multi-agent exploration and communication. Together, these results suggest that greater agent autonomy and multi-agent evolution can substantially improve open-ended discovery. Code is available at https://github.com/Human-Agent-Society/CORAL.

Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang• 2026

Related benchmarks

TaskDatasetResultRank
Traveling Salesperson ProblemTSP N=500 Generalization (128 instances)
Optimality Gap0.882
41
Traveling Salesperson ProblemTSP-20 (train)
Objective Value3.84
29
Traveling Salesperson ProblemTSP-50 (train)
Objective Value5.715
29
Online Bin PackingOnline BPP Weibull-5k (test)
Objective Gap3.091
28
Multidimensional Knapsack ProblemMKP
Objective Value46.58
24
Permutation Flow Shop Scheduling ProblemPFSP
Optimality Gap5.04
24
Multi-agent system task solvingWorkBench
Accuracy51.1
21
Multi-agent system task solvingFinance
Accuracy40.9
21
Multi-agent system task solvingBrowsecomp
Accuracy50.5
21
Multi-agent system task solvingPlancraft
Accuracy45.8
21
Showing 10 of 24 rows

Other info

GitHub

Follow for update