Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning

About

Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, enabling a long-horizon agentic paradigm. However, unlike real-time conversational assistants, DR is computationally expensive and time-consuming, creating an autonomy-interaction dilemma: high autonomy on ambiguous user queries often leads to prolonged execution with unsatisfactory outcomes. To address this, we propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research. To overcome the scarcity of open-ended research data, we introduce a scalable pipeline that expands a few seed samples into high-quality dialogue turns via a shallow-to-deep intent refinement graph. We further adopt a two-stage reinforcement learning (RL) strategy: Stage I applies RL on offline dialogues to efficiently learn general user-interaction behavior, while Stage II uses the trained agent and a user simulator for online rollouts to strengthen adaptation to diverse user feedback. Extensive experiments show that IntentRL significantly improves both intent hit rate and downstream task performance, outperforming the built-in clarify modules of closed-source DR agents and proactive LLM baselines.

Haohao Luo, Zexi Li, Yuexiang Xie, Wenhao Zhang, Yaliang Li, Ying Shen• 2026

Related benchmarks

TaskDatasetResultRank
Deep Research Report GenerationDeepResearch Bench
Comprehensiveness43.1
54
Deep Research Report GenerationPDR-Bench
P-Score7.21
22
Deep Research Report GenerationRigorous Bench
Quality0.6247
22
Clarification GenerationDeepResearch Bench online interactive settings
Intent Precision36.44
6
Clarification GenerationDeepResearch Bench offline (test)
Quality Score2.43
4
Showing 5 of 5 rows

Other info

Follow for update