RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context

About

Large language models are evolving from single-turn responders into tool-using agents capable of sustained reasoning and decision-making for deep research. Prevailing systems adopt a linear pipeline of plan to search to write to a report, which suffers from error accumulation and context rot due to the lack of explicit control over both model behavior and context. We introduce RhinoInsight, a deep research framework that adds two control mechanisms to enhance robustness, traceability, and overall quality without parameter updates. First, a Verifiable Checklist module transforms user requirements into traceable and verifiable sub-goals, incorporates human or LLM critics for refinement, and compiles a hierarchical outline to anchor subsequent actions and prevent non-executable planning. Second, an Evidence Audit module structures search content, iteratively updates the outline, and prunes noisy context, while a critic ranks and binds high-quality evidence to drafted content to ensure verifiability and reduce hallucinations. Our experiments demonstrate that RhinoInsight achieves state-of-the-art performance on deep research tasks while remaining competitive on deep search tasks.

Yu Lei, Shuzheng Si, Wei Wang, Yifei Wu, Gang Chen, Fanchao Qi, Maosong Sun• 2025

Related benchmarks

Task	Dataset	Result
Deep Research Report Generation	DeepResearch Bench	Comprehensiveness50.51	89
Comparative Performance Evaluation	DeepConsult	Win Rate0.6851	24
Deep Research	DeepResearch Bench (test)	Comprehensiveness50.51	14
Open-ended deep research evaluation	DeepResearch Bench 100 PhD-level research tasks	Comprehensiveness50.51	9
Deep Research	DeepConsult (test)	Win Rate68.51	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord