Dynamic Dual-Granularity Skill Bank for Agentic RL

About

Agentic RL can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld, WebShop, and Search-Augmented QA tasks show that D2Skill substantially improves performance over skill-free baselines across models of different scales. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.

Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dong Li, Dongbin Zhao• 2026

Related benchmarks

Task	Dataset	Result
Interactive Decision-making	AlfWorld	Overall Success Rate87.8	398
Web Navigation and Shopping	Webshop	Score91.1	248
Question Answering	Search-QA	Average Score48.1	177
Web Shopping Agent	Webshop	Success Rate (SR)80.5	72
Interactive Task Completion	AlfWorld	Pick Success Rate93.8	72
Web navigation	Webshop	Average Score83.4	55
Embodied Task Completion	AlfWorld	Pick Success Rate97.1	54
Web-based Agent Interaction	WebShop (val)	Success Rate84.4	31
Interactive Embodied Agent Task	ALFWorld (val)	Pick Success Rate97.6	19

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord