Scaling Long-Horizon LLM Agent via Context-Folding

About

Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$\times$ smaller and significantly outperforms models that rely on summarization-based context management.

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen• 2025

Related benchmarks

Task	Dataset	Result
OS GUI Agentic Task Execution	OSWorld 361 tasks (Verified)	OS Success Rate73.91	43
Long-form deep research	DeepResearch Bench (test)	Overall Score41.79	13
multi-hop deep search	BrowseComp+	Pass@1 Accuracy35.65	5

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord