SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory

About

Long-term memory is becoming a central bottleneck for language agents. Exsting RAG and GraphRAG systems largely treat memory graphs as static retrieval middleware, which limits their ability to recover complete evidence chains from partial cues, exploit reusable graph-structrual roles, and improve the memory itself through downstream feedback. We introduce SAGE, a Self-evolving Agentic Graph-memory Engine that models graph memory as a dynamic long-term memory substrate. SAGE couples two roles: a memory writer that incrementally constucts structured graph memory from interaction histories, and a Graph Foundation Model-based memory reader to perform retrieval and provide feedback to the memory writer. We provide rigorooous theoretical annalyses supporting the framework. Across multi-hop QA, open-domain retireval, domain-specific review QA, and long-term agent-memory benchmarks, SAGE improves evidence recovery, answer grounding, and retrieval efficiency: after two self-evolution rounds, it achieves the best average rank on multi-hop QA; in zero-shot open-domain transfer, it reaches 82.5/91.6 Recall@2/5 on NQ. Further results on LongMemEval and HaluMem show that traning and reader-writer feedback improve multiple long-term memory and hallucination-diagnostic metrics, suggesting that self-evolving, structure-aware graph memory is a promising foundation for robust long-horizon language agents.

Juntong Wang, Haoyue Zhao, guanghui Pan, Xiyuan Wang, Yanbo Wang, Qiyan Deng, Muhan Zhang• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM74	559
Multi-hop Retrieval	HotpotQA	Recall@265.1	44
Multi-hop Question Answering	MuSiQue	F1 Score53.1	36
Multi-hop QA Retrieval	MuSiQue	R@243.2	21
Long-term Agent Memory Evaluation	LongMemEval	SS-U79.4	15
E-commerce Review-based Question Answering	AmazonQA (test)	BLEU-182.76	12
Open-domain retrieval	NQ (test)	R@5D91.6	10
Retrieval	HotpotQA	Retrieval Latency (s)0.032	8
Retrieval	2Wiki	Retrieval Latency (s)0.019	8
Retrieval	MuSiQue	Latency (s)0.034	8

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord