CollabLLM: From Passive Responders to Active Collaborators
About
Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction. As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce CollabLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. Its key innovation is a collaborative simulation that estimates the long-term contribution of responses using Multiturn-aware Rewards. By reinforcement fine-tuning these rewards, CollabLLM goes beyond responding to user requests, and actively uncovers user intent and offers insightful suggestions-a key step towards more human-centered AI. We also devise a multiturn interaction benchmark with three challenging tasks such as document creation. CollabLLM significantly outperforms our baselines with averages of 18.5% higher task performance and 46.3% improved interactivity by LLM judges. Finally, we conduct a large user study with 201 judges, where CollabLLM increases user satisfaction by 17.6% and reduces user spent time by 10.4%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deep Research Report Generation | DeepResearch Bench | Comprehensiveness40.37 | 54 | |
| Deep Research Report Generation | Rigorous Bench | Quality0.6257 | 22 | |
| Deep Research Report Generation | PDR-Bench | P-Score7.12 | 22 | |
| Technical Writing | Technical Writing | Discover0.458 | 12 | |
| Creative Writing | Creative Writing | Discovery Score37.3 | 12 | |
| SVG Drawing | SVG Drawing | Discover43 | 12 | |
| Clarification Generation | DeepResearch Bench online interactive settings | Intent Precision18 | 6 | |
| Preference Alignment | CSQA | Preference Alignment20 | 5 | |
| Preference Alignment | MedQA | Preference Alignment20.3 | 5 | |
| Preference Alignment | AIME | Preference Alignment26.4 | 5 |