Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory

About

Long-term memory is becoming a central bottleneck for language agents. Exsting RAG and GraphRAG systems largely treat memory graphs as static retrieval middleware, which limits their ability to recover complete evidence chains from partial cues, exploit reusable graph-structrual roles, and improve the memory itself through downstream feedback. We introduce SAGE, a Self-evolving Agentic Graph-memory Engine that models graph memory as a dynamic long-term memory substrate. SAGE couples two roles: a memory writer that incrementally constucts structured graph memory from interaction histories, and a Graph Foundation Model-based memory reader to perform retrieval and provide feedback to the memory writer. We provide rigorooous theoretical annalyses supporting the framework. Across multi-hop QA, open-domain retireval, domain-specific review QA, and long-term agent-memory benchmarks, SAGE improves evidence recovery, answer grounding, and retrieval efficiency: after two self-evolution rounds, it achieves the best average rank on multi-hop QA; in zero-shot open-domain transfer, it reaches 82.5/91.6 Recall@2/5 on NQ. Further results on LongMemEval and HaluMem show that traning and reader-writer feedback improve multiple long-term memory and hallucination-diagnostic metrics, suggesting that self-evolving, structure-aware graph memory is a promising foundation for robust long-horizon language agents.

Juntong Wang, Haoyue Zhao, guanghui Pan, Xiyuan Wang, Yanbo Wang, Qiyan Deng, Muhan Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM74
559
Multi-hop RetrievalHotpotQA
Recall@265.1
44
Multi-hop Question AnsweringMuSiQue
F1 Score53.1
36
Multi-hop QA RetrievalMuSiQue
R@243.2
21
Long-term Agent Memory EvaluationLongMemEval
SS-U79.4
15
E-commerce Review-based Question AnsweringAmazonQA (test)
BLEU-182.76
12
Open-domain retrievalNQ (test)
R@5D91.6
10
RetrievalHotpotQA
Retrieval Latency (s)0.032
8
Retrieval2Wiki
Retrieval Latency (s)0.019
8
RetrievalMuSiQue
Latency (s)0.034
8
Showing 10 of 14 rows

Other info

Follow for update