Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

About

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.

Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui, Zichen Ding, Xiang Li• 2026

Related benchmarks

TaskDatasetResultRank
Embodied Task CompletionALFWorld (test)
Pick Success Rate85.7
16
Multi-Hop Search-augmented Question AnsweringHotpotQA
Success Rate34.2
16
Multi-Hop Search-augmented Question Answering2Wiki
Success Rate35.6
16
Multi-Hop Search-augmented Question AnsweringMuSiQue
Success Rate11.8
16
Single-Hop Search-augmented Question AnsweringNQ
Success Rate37
16
Single-Hop Search-augmented Question AnsweringTriviaQA
Success Rate61.8
16
Web-based Shopping SimulationWebShop (test)
Success Rate34.6
16
Single-Hop Search-augmented Question AnsweringPopQA
Success Rate40.7
16
Multi-Hop Search-augmented Question AnsweringBamboogle
Success Rate66.1
16
Showing 9 of 9 rows

Other info

Follow for update