MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
About
The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Decision-making | AlfWorld | PICK62.8 | 52 | |
| Interactive web-based shopping tasks | Webshop | Score29.5 | 28 | |
| Code Generation | BigCodeBench (val) | Success Rate50.8 | 6 | |
| Code Generation | BigCodeBench | Last Epoch Success Rate59.5 | 6 | |
| DB Task | Lifelong Agent Bench (val) | Success Rate94.2 | 6 | |
| Exploration | ALFWorld (val) | Success Rate97.9 | 6 | |
| Exploration | AlfWorld | Success Rate (Last Epoch)94.9 | 6 | |
| Knowledge Frontier | HLE | Last Epoch Success Rate57 | 6 | |
| OS Task | Lifelong Agent Bench (val) | Success Rate74.6 | 6 | |
| OS Task | Lifelong Agent Bench OS Task | Success Rate (Last Epoch)78.8 | 6 |