SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
About
Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Decision-making | AlfWorld | PICK97.9 | 52 | |
| Interactive web-based shopping tasks | Webshop | Score85.2 | 28 | |
| Multi-hop Question Answering | HotpotQA in-domain | Accuracy43.2 | 10 | |
| Multi-hop Question Answering | Bamboogle (out-of-domain) | Accuracy73.8 | 10 | |
| Multi-hop Question Answering | 2WIKI (out-of-domain) | Accuracy40.3 | 10 | |
| Multi-hop Question Answering | MuSiQue (out-of-domain) | Accuracy20.2 | 10 | |
| Single-hop Question Answering | NQ (in-domain) | Accuracy45.9 | 9 | |
| Single-hop Question Answering | TriviaQA (out-of-domain) | Accuracy63.3 | 9 | |
| Single-hop Question Answering | PopQA out-of-domain | Accuracy45.9 | 9 |