Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

About

Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5% over ReAct, with further gains of 8.2% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en, surpassing most open-source web agents.

Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, Jingren Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Interactive Tool-Use Agent Performancetau2-Bench
Retail Performance Score70.4
84
Multi-turn tool-use interactionTau-Bench
Retail Success Rate69.6
35
Deep Researchxbench
Accuracy11
30
Clinical Decision-MakingMIMIC Common IV (test)
Diagnoses Error0.1753
28
Multi-turn tool-use interactionVitaBench
Delivery Score53.8
20
Deep ResearchGAIA
Pass@170.5
15
Deep ResearchBrowsecomp
Pass@150.9
15
Deep ResearchBrowseComp-ZH
Pass@158.1
15
Deep ResearchxBench-DS
Pass@171
15
Deep ResearchFRAMES
Accuracy46.5
14
Showing 10 of 15 rows

Other info

Follow for update