Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning

About

Large reasoning models (LRMs) combined with retrieval-augmented generation (RAG) have enabled deep research agents capable of multi-step reasoning with external knowledge retrieval. However, we find that existing approaches rarely demonstrate test-time search scaling. Methods that extend reasoning through single-query sequential search suffer from limited evidence coverage, while approaches that generate multiple independent queries per step often lack structured aggregation, hindering deeper sequential reasoning. We propose a hybrid search strategy to address these limitations. We introduce HybridDeepSearcher, a structured search agent that integrates parallel query expansion with explicit evidence aggregation before advancing to deeper sequential reasoning. To supervise this behavior, we introduce HDS-QA, a novel dataset that guides models to combine broad parallel search with structured aggregation through supervised reasoning-query0retrieval trajectories containing parallel sub-queries. Across five benchmarks, HybridDeepSearcher significantly outperforms the state-of-the-art, improving F1 scores by +15.9 on FanOutQA and +9.2 on a subset of BrowseComp. Further analysis shows its consistent test-time search scaling: performance improves as additional search turns or calls are allowed, while competing methods plateau.

Dayoon Ko, Jihyuk Kim, Haeju Park, Sohyeon Kim, Dahyun Lee, Yongrae Jo, Gunhee Kim, Moontae Lee, Kyungjae Lee• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringFRAMES
Accuracy54
14
Multi-constraint search problem solvingLiveDRBench (BrowseComp, DeepSearchQA, FRAMES, LiveDRBench, WebWalkerQA) 1.0 (test)
Accuracy16.3
14
Question AnsweringFanOutQA
F1 Score44.1
9
Question AnsweringMedBrowseComp
F1 Score23.2
9
Question AnsweringBrowsecomp
F115.1
9
Question AnsweringMuSiQue
F1 Score31.2
9
Evidence RetrievalMuSiQue
Evidence Coverage Rate40.7
6
Evidence RetrievalFanOutQA
Evidence Coverage Rate61
6
Evidence RetrievalFRAMES
Evidence Coverage Rate55.8
6
Showing 9 of 9 rows

Other info

Follow for update