MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
About
The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Decision-making | AlfWorld | Overall Success Rate90.7 | 295 | |
| Embodied Task | AlfWorld | Overall Success Rate21.4 | 169 | |
| Online Shopping | Webshop | Score29.5 | 61 | |
| Interactive web-based shopping tasks | Webshop | Score29.5 | 60 | |
| Online Shopping | WebShop (test) | Score29.5 | 59 | |
| Web Shopping Agent | Webshop | -- | 53 | |
| Interactive Task Completion | AlfWorld | Pick Success Rate100 | 45 | |
| Coding | LiveCodeBench | Accuracy45.71 | 38 | |
| Question Answering | ARC-C | Accuracy (ARC-C)84.34 | 36 | |
| Agentic Task | ALFWorld Unseen | Success Rate (SR)71.6 | 26 |