Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

About

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Sch\"utze, Volker Tresp, Yunpu Ma• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringLocomo
F133.64
125
Open-domain Question AnsweringLocomo
F10.2355
111
Single-hop Question AnsweringLocomo
F10.4686
111
Long-context Memory EvaluationLongMemEval--
103
Multi-hop ReasoningLocomo
F1 Score36.55
68
Query AnsweringPersonaMem 32K context length
Query-Answering Accuracy58
60
Query AnsweringPersonaMem 128K context length
Query-Answering Accuracy0.61
60
Open DomainLocomo
F1 Score29.34
51
TemporalLocomo
F1 Score0.4126
47
Single-HopLocomo
F1 Score37.02
47
Showing 10 of 25 rows

Other info

Follow for update