The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training
About
Prior synthetic query generation for dense retrieval produces one query per document, focusing on quality. We systematically study multi-query synthesis, discovering a quality-diversity trade-off: quality benefits in-domain, diversity benefits out-of-domain (OOD). Experiments on 31 datasets show diversity especially benefits multi-hop retrieval. Analysis reveals diversity benefit correlates with query complexity (r>=0.95), measured by content words (CW). We formalize this as the Complexity-Diversity Principle (CDP): query complexity determines optimal diversity. CDP provides thresholds (CW>10: use diversity; CW<7: avoid it) and enables CW-weighted training that improves OOD even with single-query data.
Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | BEIR (test) | -- | 76 | |
| Retrieval | TREC-DL aggregate (test) | NDCG@1054 | 38 | |
| Retrieval | BRIGHT 12 datasets aggregate (test) | NDCG@109.5 | 20 | |
| Multi-hop Retrieval | Multi-hop 4 datasets aggregate (test) | NDCG@1058.5 | 8 |
Showing 4 of 4 rows