IterResearch: Rethinking Long-Horizon Agents with Interaction Scaling

About

Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introduce \textbf{IterResearch}, a novel iterative deep-research paradigm that revisits long-horizon research through the lens of Interaction Scaling. Instead of relying on linear context accumulation, we adopt an MDP-inspired architecture with strategic workspace reconstruction. By maintaining an evolving report as memory and periodically synthesizing insights, our approach preserves consistent reasoning capacity across arbitrary exploration depths. To effectively train this paradigm, we employ Efficiency-Aware Policy Optimization (EAPO), a training strategy that adapts geometric reward discounting to incentivize efficient exploration and utilizes adaptive downsampling for stable distributed training. Extensive experiments demonstrate that IterResearch achieves substantial improvements over existing open-source agents with average +14.5pp across six benchmarks and narrows the gap with frontier proprietary systems. Remarkably, our paradigm exhibits unprecedented interaction scaling, extending to 2048 interactions with dramatic performance gains (from 3.5\% to 42.5\%), and serves as an effective prompting strategy, improving frontier models by up to 19.2pp over ReAct on long-horizon tasks. These findings position IterResearch as a versatile solution for long-horizon reasoning, effective both as a trained agent and as a prompting paradigm for frontier models.

Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou• 2025

Related benchmarks

Task	Dataset	Result
Interactive Tool-Use Agent Performance	tau2-Bench	Retail Performance Score71.1	102
Long-horizon agentic task	HLE	Performance28.8	41
Multi-turn tool-use interaction	Tau-Bench	Retail Success Rate76.5	35
Deep Research Task	Browsecomp	Accuracy37.3	29
Deep Research	GAIA	Accuracy72.8	24
Deep search	BrowseComp v1.0 (test)	Success Rate35.3	23
Deep search	BrowseComp-ZH v1.0 (test)	Success Rate34	22
Deep search	XBench-DS v1.0 (test)	Success Rate44	22
Multi-turn tool-use interaction	VitaBench	Delivery Score50.8	20
Agentic Search	Browsecomp	Score37.3	19

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord