Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentFold: Long-Horizon Web Agents with Proactive Context Management

About

LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAct-based agents suffer from context saturation as they accumulate noisy, raw histories, while methods that fixedly summarize the full history at each step risk the irreversible loss of critical details. Addressing these, we introduce AgentFold, a novel agent paradigm centered on proactive context management, inspired by the human cognitive process of retrospective consolidation. AgentFold treats its context as a dynamic cognitive workspace to be actively sculpted, rather than a passive log to be filled. At each step, it learns to execute a `folding' operation, which manages its historical trajectory at multiple scales: it can perform granular condensations to preserve vital, fine-grained details, or deep consolidations to abstract away entire multi-step sub-tasks. The results on prominent benchmarks are striking: with simple supervised fine-tuning (without continual pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or matches open-source models of a dramatically larger scale, such as the DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like OpenAI's o4-mini.

Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Agentic Web BrowsingBrowseComp-ZH
Pass@147.3
52
Multi-hop Question AnsweringMuSiQue
EM22.2
50
Agentic Web BrowsingBrowsecomp
Pass@136.2
47
Multi-hop Question AnsweringSQuAD
Exact Match (EM)21.8
30
Web BrowsingBC-plus
EM15.6
30
Long-horizon Agent PerformanceBrowsecomp
Overall Score36.2
19
Long-horizon Agent PerformanceBrowseComp-ZH
Overall Score47.3
17
Agentic SearchBrowsecomp
Score36.2
14
Long-horizon Agent PerformanceWideSearch
Overall Score62.1
13
Agentic SearchGAIA text
Score67
12
Showing 10 of 10 rows

Other info

Follow for update