Structured Agent Distillation for Large Language Model
About
Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into {[REASON]} and {[ACT]} spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | HotpotQA | CoT Match Rate86.5 | 54 | |
| Web-based Agent Interaction | Webshop | CoT Match Rate74.6 | 41 | |
| Interactive Decision-making | Webshop | Success Rate64.1 | 36 | |
| Question Answering | HotpotQA | Success Rate75.2 | 33 | |
| Sequential Decision Making | ALFWorld (test) | Success Rate68 | 26 | |
| Decision Making | AlfWorld | Steps6.4 | 22 | |
| Web-based Reasoning | Webshop | Average Reasoning Length (tokens)34.9 | 18 | |
| Embodied AI reasoning | AlfWorld | CoT Match Rate77.2 | 18 | |
| Sequential Decision Making | HotpotQA | Average Steps per Episode4.8 | 18 | |
| Interactive Reasoning | AlfWorld | Average Reasoning Length (tokens)41.2 | 18 |