ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
About
Conversational search systems require effective handling of context-dependent queries that often contain ambiguity, omission, and coreference. Conversational Query Reformulation (CQR) addresses this challenge by transforming these queries into self-contained forms suitable for off-the-shelf retrievers. However, existing CQR approaches suffer from two critical constraints: high dependency on costly external supervision from human annotations or large language models, and insufficient alignment between the rewriting model and downstream retrievers. We present ConvSearch-R1, the first self-driven framework that completely eliminates dependency on external rewrite supervision by leveraging reinforcement learning to optimize reformulation directly through retrieval signals. Our novel two-stage approach combines Self-Driven Policy Warm-Up to address the cold-start problem through retrieval-guided self-distillation, followed by Retrieval-Guided Reinforcement Learning with a specially designed rank-incentive reward shaping mechanism that addresses the sparsity issue in conventional retrieval metrics. Extensive experiments on TopiOCQA and QReCC datasets demonstrate that ConvSearch-R1 significantly outperforms previous state-of-the-art methods, achieving over 10% improvement on the challenging TopiOCQA dataset while using smaller 3B parameter models without any external supervision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Conversational Retrieval | TopiOCQA (test) | NDCG@350.1 | 26 | |
| Conversational Query Retrieval | TopiOCQA | MRR38.5 | 20 | |
| Conversational Query Retrieval | QReCC | MRR55.1 | 20 | |
| Answer Generation | TopiOCQA | F1 Score7.6 | 17 | |
| Answer Generation | Inscit | F1 Score15.7 | 16 | |
| Multi-hop Retrieval | HotpotQA | Recall@40.83 | 14 | |
| Conversational Information Retrieval | TopiOCQA (test) | R@1072 | 13 | |
| Conversational Information Retrieval | QReCC (test) | R@1077.2 | 13 | |
| Query Decomposition / Retrieval | HotpotQA 2018 (test) | Recall@483 | 9 | |
| End-to-end Question Answering | HotpotQA ANCE | MAP@1044.4 | 3 |