Skill Weaving: Efficient LLM Improvement via Modular Skillpacks
About
Large language models increasingly require specialization across diverse domains, yet existing approaches struggle to balance multi-domain capacities with strict memory and inference constraints. In this work, we introduce SkillWeave, a modular improvement framework that enables LLMs to specialize under fixed memory budgets. SkillWeave partitions full capabilities of a general-purpose model into skillpacks -- lightweight, domain-specific delta modules -- that reorganize and refine the model's internal knowledge. For efficient deployment, SkillWeave integrates SkillZip to compress skillpacks into compact and inference-ready format, enabling strong multi-domain performance with low-latency execution. On multi-task and agentic benchmarks, a 9B SkillWeave model outperforms several baselines and even surpasses a 32B monolithic LLM, while achieving up to 4x speedup.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematics | MATH | MATH Accuracy62.5 | 136 | |
| Reasoning | ARC-C | -- | 112 | |
| Mathematics | GSM8K | GSM8K Score91 | 87 | |
| Reasoning | BBH | BBH Score76.3 | 39 | |
| Coding | MBPP | Overall Average Score78 | 37 | |
| Reasoning | BBH | Score36.4 | 36 | |
| Dialogue | IFEval | IFEval79.1 | 34 | |
| Dialogue | AlpacaEval 2 | AlpacaEval2 Score52.8 | 34 | |
| Coding | HumanEval | HumanEval75 | 28 | |
| Coding | MBPP | Score49.7 | 23 |