Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scaling Long-Horizon LLM Agent via Context-Folding

About

Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$\times$ smaller and significantly outperforms models that rely on summarization-based context management.

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen• 2025

Related benchmarks

TaskDatasetResultRank
OS GUI Agentic Task ExecutionOSWorld 361 tasks (Verified)
Average Success Rate53.69
21
Long-form deep researchDeepResearch Bench (test)
Overall Score41.79
13
multi-hop deep searchBrowseComp+
Pass@1 Accuracy35.65
5
Showing 3 of 3 rows

Other info

Follow for update