SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
About
Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Decision-making | AlfWorld | Overall Success Rate89.9 | 295 | |
| Single-hop Question Answering | PopQA | EM45.9 | 186 | |
| Embodied Task | AlfWorld | Overall Success Rate89.9 | 169 | |
| Web Navigation and Shopping | Webshop | Score85.2 | 153 | |
| Multi-hop QA | HotpotQA | -- | 143 | |
| Single-hop Question Answering | TriviaQA | EM63.3 | 133 | |
| Question Answering | Search-QA | Average Score47.1 | 130 | |
| Multi-hop QA | MuSiQue | EM20.2 | 95 | |
| Online Shopping | Webshop | Score85.2 | 61 | |
| Interactive web-based shopping tasks | Webshop | Score85.2 | 60 |