ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking
About
Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional parallel thinking faces two key challenges in this setting: inefficiency from repeatedly rolling out from scratch, and difficulty in integrating long-horizon reasoning trajectories during answer generation, as limited context capacity prevents full consideration of the reasoning process. To address these issues, we propose ParallelMuse, a two-stage paradigm designed for deep IS agents. The first stage, Functionality-Specified Partial Rollout, partitions generated sequences into functional regions and performs uncertainty-guided path reuse and branching to enhance exploration efficiency. The second stage, Compressed Reasoning Aggregation, exploits reasoning redundancy to losslessly compress information relevant to answer derivation and synthesize a coherent final answer. Experiments across multiple open-source agents and benchmarks demonstrate up to 62% performance improvement with a 10--30% reduction in exploratory token consumption.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Web Browsing | Browsecomp | Accuracy70 | 52 | |
| Logical reasoning | HLE | Accuracy0.5806 | 46 | |
| Medical Reasoning | HealthBench Hard | Accuracy23 | 41 | |
| BrowseComp-Plus | BrowseComp+ | Accuracy73.33 | 25 | |
| HLE | HLE | Accuracy50.32 | 25 | |
| Long-horizon agentic task | BrowseComp+ | Performance76.67 | 24 | |
| Long-horizon agentic task | HLE | Performance58.06 | 24 | |
| Long-horizon agentic task | Browsecomp | Performance70 | 24 | |
| DeepSearchQA | DeepSearchQA | Accuracy64 | 19 | |
| Question Answering | DeepSearchQA | Accuracy61.33 | 19 |