Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

About

This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and monolithic generation paradigms that include redundant, irrelevant evidence, suffering from hallucination issues and low citation accuracy. To address these challenges, we introduce \textbf{WebWeaver}, a novel dual-agent framework that emulates the human research process. The planner operates in a dynamic cycle, iteratively interleaving evidence acquisition with outline optimization to produce a comprehensive, citation-grounded outline linking to a memory bank of evidence. The writer then executes a hierarchical retrieval and writing process, composing the report section by section. By performing targeted retrieval of only the necessary evidence from the memory bank via citations for each part, it effectively mitigates long-context issues and citation hallucinations. Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym. These results validate our human-centric, iterative methodology, demonstrating that adaptive planning and focused synthesis are crucial for producing comprehensive, trusted, and well-structured reports.

Zijian Li, Xin Guan, Bo Zhang, Shen Huang, Houquan Zhou, Shaopeng Lai, Ming Yan, Yong Jiang, Pengjun Xie, Fei Huang, Jun Zhang, Jingren Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Deep Research Report GenerationDeepResearch Bench
Comprehensiveness51.45
54
Comparative Performance EvaluationDeepConsult
Win Rate66.86
24
Report GenerationDeepResearch Bench 2025 (test)
Comprehensiveness45.2
16
Open-Ended Deep ResearchDeepResearchGym
Clarity90.71
9
Open-Ended Deep ResearchDeepConsult
Win Rate61.27
9
Open-ended deep research evaluationDeepResearch Bench 100 PhD-level research tasks
Comprehensiveness51.45
9
Deep ResearchDeepConsult (test)
Win Rate66.16
8
Showing 7 of 7 rows

Other info

Follow for update