Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

R^2-Mem: Reflective Experience for Memory Search

About

Deep search has recently emerged as a promising paradigm for enabling agents to retrieve fine-grained historical information without heavy memory pre-managed. However, existing deep search agents for memory system repeat past error behaviors because they fail to learn from the prior high- and low-quality search trajectories. To address this limitation, we propose R^2-Mem, a reflective experience framework for memory search systems. In the offline stage, a Rubric-guided Evaluator scores low- and high-quality steps in historical trajectories, and a self-Reflection Learner distills the corresponding abstract experience. During the online inference, the retrieved experience will guide future search actions to avoid repeated mistakes and maintain high-quality behaviors. Extensive experiments demonstrate that R^2-Mem consistently improves both effectiveness and efficiency over strong baselines, improving F1 scores by up to 22.6%, while reducing token consumption by 12.9% and search iterations by 20.2%. These results verify that R^2-Mem provides a RL-free and low-cost solution for self-improving LLM agents.

Xinyuan Wang, Wenyu Mao, Junkang Wu, Xiang Wang, Xiangnan He• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop ReasoningLocomo
F1 Score41.62
68
Open DomainLocomo
F1 Score27.48
51
Single-HopLocomo
F1 Score59.03
47
TemporalLocomo
F1 Score0.4753
47
Overall PerformanceLocomo
BLEU44.91
24
Question AnsweringHotpotQA
Accuracy (56K Context)51.53
4
Showing 6 of 6 rows

Other info

Follow for update