Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SkillRouter: Skill Routing for LLM Agents at Scale

About

Reusable skills let LLM agents package task-specific procedures, tool affordances, and execution guidance into modular building blocks. As skill ecosystems grow to tens of thousands of entries, exposing every skill at inference time becomes infeasible. This creates a skill-routing problem: given a user task, the system must identify relevant skills before downstream planning or execution. Existing agent stacks often rely on progressive disclosure, exposing only skill names and descriptions while hiding the full implementation body. We examine this design choice on a SkillsBench-derived benchmark with approximately 80K candidate skills, targeting the practically important setting of large skill registries with heavy overlap. Across representative sparse, dense, and reranking baselines on this setting, hiding the skill body causes a 31--44 percentage point drop in routing accuracy, showing that full skill text is a critical routing signal in this setting rather than a minor metadata refinement. Motivated by this finding, we present SkillRouter, a compact 1.2B full-text retrieve-and-rerank pipeline. SkillRouter achieves 74.0% Hit@1 on our benchmark -- the strongest average top-1 routing performance among the baselines we evaluate -- while using 13$\times$ fewer parameters and running 5.8$\times$ faster than the strongest base pipeline. The ranking gains further generalize to a supplementary benchmark independently constructed from three skill sources. In a complementary end-to-end study across four coding agents, routing gains transfer to improved task success, with larger gains for more capable agents.

YanZhao Zheng, ZhenTao Zhang, Chao Ma, YuanQiang Yu, JiHuai Zhu, Yong Wu, Tianze Xu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu• 2026

Related benchmarks

TaskDatasetResultRank
Agent TaskAlfWorld
Success Rate74.4
40
Skill RoutingSkill Routing Dataset Easy
Hit@178.7
26
Skill RoutingSkill Routing Dataset Hard
Hit@173.3
26
Skill RoutingSkill Routing Dataset
Hit@176
26
Agent Task Completiontau2-Bench Telecom
Pass Rate62.8
9
Agent Task CompletionSkillsBench
Pass Rate16.5
9
Agent Task Completiontau2-bench Airline
Pass Rate54
9
Agent Task Completiontau2-bench, SkillsBench, and ALFWorld Average
Average Pass Rate53.5
9
Agent Task Successtau2-bench Retail Domain
Total Pass Rate59.8
9
Downstream task executionSkillsBench
Reward Mean (%)22.04
6
Showing 10 of 13 rows

Other info

Follow for update