Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

About

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld, +6.6\% for Search-QA, and+10.1\% for WebShop), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingAlfWorld
Overall Success Rate89.8
295
Embodied TaskAlfWorld
Overall Success Rate89.8
169
Question AnsweringSearch-QA
Average Score44.4
130
Web Shopping AgentWebshop
Score83.3
53
Question AnsweringNQ, TriviaQA, PopQA, HotpotQA, 2wiki, MuSiQue, Bamboogle
NQ Score37.9
22
Embodied Task CompletionAlfWorld
Pick Success Rate100
21
Embodied Task CompletionALFWorld ID
Pick Success Rate94.3
18
Web Shopping AgentWebShop OOD v1.0
Access Success Rate42.1
18
Web-search reasoningWeb-search reasoning suite (HotpotQA, 2Wiki, MuSiQue) (test)
Accuracy (HotpotQA)40
18
Embodied Task CompletionALFWorld v1.0 (test)
Pick Success Rate93.6
15
Showing 10 of 10 rows

Other info

GitHub

Follow for update