Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

About

Large Language Model (LLM)-based agents solve complex tasks through iterative reasoning, exploration, and tool-use, a process that can result in long, expensive context histories. While state-of-the-art Software Engineering (SE) agents like OpenHands or Cursor use LLM-based summarization to tackle this issue, it is unclear whether the increased complexity offers tangible performance benefits compared to simply omitting older observations. We present a systematic comparison of these approaches within SWE-agent on SWE-bench Verified across five diverse model configurations. Moreover, we show initial evidence of our findings generalizing to the OpenHands agent scaffold. We find that a simple environment observation masking strategy halves cost relative to the raw agent while matching, and sometimes slightly exceeding, the solve rate of LLM summarization. Additionally, we introduce a novel hybrid approach that further reduces costs by 7% and 11% compared to just observation masking or LLM summarization, respectively. Our findings raise concerns regarding the trend towards pure LLM summarization and provide initial evidence of untapped cost reductions by pushing the efficiency-effectiveness frontier. We release code and data for reproducibility.

Tobias Lindenbauer, Igor Slinko, Ludwig Felder, Egor Bogomolov, Yaroslav Zharov• 2025

Related benchmarks

TaskDatasetResultRank
Mean RewardWebshop
Mean Reward59.4
30
Mean RewardScienceWorld
Mean Reward0.215
30
Mean RewardAlfWorld
Mean Reward0.3
30
Interactive Agent TaskAlfWorld
Effective Steps Multiplier2.8
15
Interactive Agent TaskWebshop
Efficiency Multiplier4.8
15
Interactive Agent TaskScienceWorld
Efficiency Factor1.5
15
Showing 6 of 6 rows

Other info

Follow for update