SkillRouter: Skill Routing for LLM Agents at Scale

About

Reusable skills let LLM agents package task-specific procedures, tool affordances, and execution guidance into modular building blocks. As skill ecosystems grow to tens of thousands of entries, exposing every skill at inference time becomes infeasible. This creates a skill-routing problem: given a user task, the system must identify relevant skills before downstream planning or execution. Existing agent stacks often rely on progressive disclosure, exposing only skill names and descriptions while hiding the full implementation body. We examine this design choice on a SkillsBench-derived benchmark with approximately 80K candidate skills, targeting the practically important setting of large skill registries with heavy overlap. Across representative dense and reranking baselines on this setting, hiding the skill body causes a 37-44 percentage point drop in routing accuracy. Stronger controls show that the missing signal is body-resident rather than a simple length artifact: body-distilled descriptions recover part of the gap, but remain 7-21 points below direct all-field routing, while a metadata-only encoder trained with the same data remains 14.0 points below its all-field counterpart. Motivated by this finding, we present Skillrouter, a compact 1.2B body-aware retrieve-and-rerank pipeline. Skillrouter achieves 74.0% Hit@1 on our benchmark -- the strongest average top-1 routing performance among the baselines we evaluate -- while using 13$\times$ fewer parameters and running 5.8$\times$ faster than the strongest base pipeline. The ranking gains further generalize to a supplementary benchmark independently constructed from three skill sources. In a complementary end-to-end study across four coding agents, routing gains transfer to improved task success, with larger gains for more capable agents.

YanZhao Zheng, ZhenTao Zhang, Chao Ma, YuanQiang Yu, JiHuai Zhu, Yong Wu, Tianze Xu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu• 2026

Related benchmarks

Task	Dataset	Result
Agent Task	AlfWorld	Success Rate74.4	40
Science Experimentation Reasoning	ScienceWorld Seen	Success Rate75.28	32
Science Experimentation Reasoning	ScienceWorld Unseen	Success Rate72.85	32
Skill Routing	Skill Routing Dataset Easy	Hit@178.7	26
Skill Routing	Skill Routing Dataset Hard	Hit@173.3	26
Skill Routing	Skill Routing Dataset	Hit@176	26
Embodied household tasks	ALFWorld Unseen	Average Accumulated Reward73.14	23
Household Manipulation	ALFWorld Seen	Average Reward80.72	12
Agent Task Completion	tau2-Bench Telecom	Pass Rate62.8	9
Agent Task Completion	SkillsBench	Pass Rate16.5	9

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord