The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

About

Large Language Model (LLM)-based agents solve complex tasks through iterative reasoning, exploration, and tool-use, a process that can result in long, expensive context histories. While state-of-the-art Software Engineering (SE) agents like OpenHands or Cursor use LLM-based summarization to tackle this issue, it is unclear whether the increased complexity offers tangible performance benefits compared to simply omitting older observations. We present a systematic comparison of these approaches within SWE-agent on SWE-bench Verified across five diverse model configurations. Moreover, we show initial evidence of our findings generalizing to the OpenHands agent scaffold. We find that a simple environment observation masking strategy halves cost relative to the raw agent while matching, and sometimes slightly exceeding, the solve rate of LLM summarization. Additionally, we introduce a novel hybrid approach that further reduces costs by 7% and 11% compared to just observation masking or LLM summarization, respectively. Our findings raise concerns regarding the trend towards pure LLM summarization and provide initial evidence of untapped cost reductions by pushing the efficiency-effectiveness frontier. We release code and data for reproducibility.

Tobias Lindenbauer, Igor Slinko, Ludwig Felder, Egor Bogomolov, Yaroslav Zharov• 2025

Related benchmarks

Task	Dataset	Result
Mean Reward	Webshop	Mean Reward59.4	30
Mean Reward	ScienceWorld	Mean Reward0.215	30
Mean Reward	AlfWorld	Mean Reward0.3	30
Interactive Agent Task	AlfWorld	Effective Steps Multiplier2.8	15
Interactive Agent Task	Webshop	Efficiency Multiplier4.8	15
Interactive Agent Task	ScienceWorld	Efficiency Factor1.5	15

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord