Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

About

Prior synthetic query generation for dense retrieval produces one query per document, focusing on quality. We systematically study multi-query synthesis, discovering a quality-diversity trade-off: quality benefits in-domain, diversity benefits out-of-domain (OOD). Experiments on 31 datasets show diversity especially benefits multi-hop retrieval. Analysis reveals diversity benefit correlates with query complexity (r>=0.95), measured by content words (CW). We formalize this as the Complexity-Diversity Principle (CDP): query complexity determines optimal diversity. CDP provides thresholds (CW>10: use diversity; CW<7: avoid it) and enables CW-weighted training that improves OOD even with single-query data.

Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto• 2026

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)--
76
RetrievalTREC-DL aggregate (test)
NDCG@1054
38
RetrievalBRIGHT 12 datasets aggregate (test)
NDCG@109.5
20
Multi-hop RetrievalMulti-hop 4 datasets aggregate (test)
NDCG@1058.5
8
Showing 4 of 4 rows

Other info

Follow for update