A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

About

As terminal agents scale to long-horizon, multi-turn workflows, a key bottleneck is not merely limited context length, but the accumulation of noisy terminal observations in the interaction history. Retaining raw observations preserves useful environment feedback, but also leads to context saturation and high token cost; conversely, naive compression may discard task-critical signals needed for subsequent actions. Because terminal environments are highly heterogeneous across repositories, commands, and execution states, heuristic-based or fixed-prompt compression methods are difficult to generalize. We propose TACO, a plug-and-play, training-free, self-evolving Terminal Agent Compression framework for existing terminal agents. TACO automatically discovers, refines, and reuses structured compression rules from interaction trajectories, enabling workflow-adaptive filtering of low-value terminal outputs while preserving task-relevant observations. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks, including SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench, show that TACO consistently improves task performance and token efficiency across agent scaffolds and backbone models. On TerminalBench, TACO yields 1%-4% accuracy gains across strong agentic models and improves accuracy by around 2%-3% under the same token budget. On additional terminal-related benchmarks, it reduces total token consumption while maintaining or improving task success rates. These results suggest that self-evolving, workflow-adaptive observation compression is an effective path toward more reliable and efficient long-horizon terminal agents. The code is publicly available at https://github.com/multimodal-art-projection/TACO.

Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, Chenghua Lin• 2026

Related benchmarks

Task	Dataset	Result
Terminal-related CLI agent task	TerminalBench 1.0	Accuracy46.25	51
Terminal-related CLI agent task	TerminalBench 2.0	Accuracy44.16	29
Terminal-related CLI agent task	SWE-Bench Lite	Accuracy57.12	2
Terminal-related CLI agent task	DevEval	Accuracy39.74	2
Terminal-related CLI agent task	CRUST-Bench	Accuracy48.05	2
Terminal-related CLI agent task	CompileBench	Accuracy75	2

Showing 6 of 6 rows

Other info

GitHub

Follow for update

@wizwand_team Discord