ACON: Optimizing Context Compression for Long-horizon LLM Agents
About
Large language models (LLMs) are increasingly deployed as agents in dynamic real-world environments, where success depends on maintaining precise records of actions and observations. However, the resulting unbounded context growth in long-horizon agentic tasks makes two critical bottlenecks: prohibitive inference memory costs and reasoning degradation due to irrelevant information. Existing compression methods fail to fully address this, often relying on brittle heuristics or requiring parameter updates impractical for proprietary or large-scale LLMs. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both observations and history into concise, informative representations. Distinct from prior works, ACON employs an optimization in natural language space: it iteratively refines compression guidelines based on failure analysis of the agent, ensuring critical state information is preserved without model fine-tuning. To further minimize computational overhead, we distill the optimized compressor into smaller models. Experiments on AppWorld, OfficeBench, and Multi-objective QA demonstrate that ACON reduces peak token usage by 26-54% while improving task success over existing compression baselines. Notably, it enables smaller LMs to function effectively as long-horizon agents, achieving up to 46% performance improvement by mitigating context distraction. Our code is available at https://github.com/microsoft/acon.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mean Reward | Webshop | Mean Reward53.3 | 30 | |
| Mean Reward | AlfWorld | Mean Reward0.4 | 30 | |
| Mean Reward | ScienceWorld | Mean Reward0.172 | 30 | |
| Agentic Task Completion | AppWorld (test-normal) | Accuracy56.5 | 22 | |
| Interactive Agent Task | Webshop | Efficiency Multiplier14 | 15 | |
| Interactive Agent Task | ScienceWorld | Efficiency Factor3.4 | 15 | |
| Interactive Agent Task | AlfWorld | Effective Steps Multiplier3.3 | 15 | |
| Multi-step Reasoning | TriviaQA | Task Performance57.14 | 14 | |
| Web-based tool-use | Mind2Web | Task Performance30.77 | 12 | |
| Agentic Task Completion | AppWorld Easy normal (test) | Accuracy86 | 11 |