Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

About

Self-evolving language-model agents must decide what to learn next and how to preserve what they have learned across iterations. Existing systems typically carry this cross-iteration knowledge as natural-language feedback, flat episodic memory, or implicit reinforcement signals, none of which cleanly supports a frozen weak backbone at inference time. This paper introduces MAGE (Multi-Agent Graph-guided Evolution), a framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph. Its experience subgraph stores both teacher-written failure corrections and the learner's own past correct reasoning traces, which are retrieved as task-conditioned guidance for a frozen execution model. During evolution, the graph, a task-level search bandit, and a skill-level routing bandit are updated from the same reward stream, while the learner's backbone remains unchanged. We further provide structural analysis showing how append-only memory growth, bounded curriculum coverage, and task-filtered retrieval together support stable improvement of the retrieval substrate for frozen-learner evolution. Across nine benchmarks spanning mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice, an open-world survival game, and web navigation, MAGE achieves strong performance against prompt-based frozen-backbone baselines. Ablations show that self-harvested success traces and teacher-written corrections are complementary, with success memories contributing most on reasoning-template-heavy tasks and corrective memories supporting harder composition and interaction settings.

Ruiyi Yang, Zechen Li, Hao Xue, Imran Razzak, Flora D. Salim• 2026

Related benchmarks

TaskDatasetResultRank
Math ReasoningGSM8K
Accuracy92.5
254
Financial ReasoningFinQA--
69
Web navigationWebShop (test)
Score90.2
16
Sequential environment decision makingCrafter BALROG protocol
Peak Task Score (%)37.9
8
Mathematical ReasoningGSM8K 200 held-out questions
Accuracy90.4
7
Mathematical ReasoningRealMath (200 held-out questions)
Accuracy85.1
7
Multi-hop Question AnsweringHotpotQA 200 held-out questions
Accuracy91.5
7
Situation ReasoningSTBench 200 held-out questions
Accuracy57.8
7
Web-based Question AnsweringWebQA (200 held-out questions)
Accuracy62.7
7
Web-based Sequential Decision MakingWebShop hundred-product catalog setup
Mean Reward90.2
6
Showing 10 of 11 rows

Other info

Follow for update