SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

About

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, Huaxiu Yao• 2026

Related benchmarks

Task	Dataset	Result
Interactive Decision-making	AlfWorld	Overall Success Rate89.9	295
Single-hop Question Answering	PopQA	EM45.9	186
Embodied Task	AlfWorld	Overall Success Rate89.9	169
Web Navigation and Shopping	Webshop	Score85.2	153
Multi-hop QA	HotpotQA	--	143
Single-hop Question Answering	TriviaQA	EM63.3	133
Question Answering	Search-QA	Average Score47.1	130
Multi-hop QA	MuSiQue	EM20.2	95
Online Shopping	Webshop	Score85.2	61
Interactive web-based shopping tasks	Webshop	Score85.2	60

Showing 10 of 67 rows

Other info

GitHub

Follow for update

@wizwand_team Discord