Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

About

The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, Muning Wen• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingAlfWorld
Overall Success Rate21.4
118
Embodied TaskAlfWorld
Overall Success Rate21.4
96
Interactive web-based shopping tasksWebshop
Score29.5
60
Agentic ReasoningALFWorld (test)
Success Rate21.4
21
Code GenerationBigCodeBench (BCB) 342 tasks 30% held-out (unseen)
Success Rate (SR)50.8
15
Agentic ReasoningWebShop (test)
Success Rate9.2
15
Multi-domain knowledge reasoningHLE 500-question ablation
Success Rate (Last)57.3
12
Agentic ReasoningSokoban (test)
Success Rate4.2
12
Agentic ReasoningMinesweeper (test)
Success Rate7
12
ExplorationAlfWorld
Success Rate (Last Epoch)94.9
10
Showing 10 of 22 rows

Other info

Follow for update