MemR$^3$: Memory Retrieval via Reflective Reasoning for LLM Agents

About

Memory systems have been designed to leverage past experiences in Large Language Model (LLM) agents. However, many deployed memory systems primarily optimize compression and storage, with comparatively less emphasis on explicit, closed-loop control of memory retrieval. From this observation, we build memory retrieval as an autonomous, accurate, and compatible agent system, named MemR$^3$, which has two core mechanisms: 1) a router that selects among retrieve, reflect, and answer actions to optimize answer quality; 2) a global evidence-gap tracker that explicitly renders the answering process transparent and tracks the evidence collection process. This design departs from the standard retrieve-then-answer pipeline by introducing a closed-loop control mechanism that enables autonomous decision-making. Empirical results on the LoCoMo benchmark demonstrate that MemR$^3$ surpasses strong baselines on LLM-as-a-Judge score, and particularly, it improves existing retrievers across four categories with an overall improvement on RAG (+7.29%) and Zep (+1.94%) using GPT-4.1-mini backend, offering a plug-and-play controller for existing memory stores.

Xingbo Du, Loka Li, Duzhen Zhang, Le Song• 2025

Related benchmarks

Task	Dataset	Result
Long-term memory evaluation	Locomo	Overall F120.49	128
Question Answering	LoCoMo (test)	Single-hop Score86.43	24
Long-horizon memory-based reasoning	Locomo	Multi-hop R-1 Score17.34	10
Long-term Memory Retrieval	LongMemEval	Knowledge Update78.2	10
Factual Accuracy and Reasoning	Locomo	Single-hop Accuracy88.53	9
Proactive memory triggering	ProactiveMemBench	Recall@5 (Behavioral)53.8	8
Long-term Dialogue Memory Management	Locomo	F135.3	7
Long-term dialogue memory evaluation	GVD	Accuracy93	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord