MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
About
The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Decision-making | AlfWorld | Overall Success Rate21.4 | 118 | |
| Embodied Task | AlfWorld | Overall Success Rate21.4 | 96 | |
| Interactive web-based shopping tasks | Webshop | Score29.5 | 60 | |
| Agentic Reasoning | ALFWorld (test) | Success Rate21.4 | 21 | |
| Code Generation | BigCodeBench (BCB) 342 tasks 30% held-out (unseen) | Success Rate (SR)50.8 | 15 | |
| Agentic Reasoning | WebShop (test) | Success Rate9.2 | 15 | |
| Multi-domain knowledge reasoning | HLE 500-question ablation | Success Rate (Last)57.3 | 12 | |
| Agentic Reasoning | Sokoban (test) | Success Rate4.2 | 12 | |
| Agentic Reasoning | Minesweeper (test) | Success Rate7 | 12 | |
| Exploration | AlfWorld | Success Rate (Last Epoch)94.9 | 10 |