Optimizing Test-Time Query Representations for Dense Retrieval
About
Recent developments of dense retrieval rely on quality representations of queries and contexts from pre-trained query and context encoders. In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results. We leverage a cross-encoder re-ranker to provide fine-grained pseudo labels over retrieval results and iteratively optimize query representations with gradient descent. Our theoretical analysis reveals that TOUR can be viewed as a generalization of the classical Rocchio algorithm for pseudo relevance feedback, and we present two variants that leverage pseudo-labels as hard binary or soft continuous labels. We first apply TOUR on phrase retrieval with our proposed phrase re-ranker, and also evaluate its effectiveness on passage retrieval with an off-the-shelf re-ranker. TOUR greatly improves end-to-end open-domain question answering accuracy, as well as passage retrieval performance. TOUR also consistently improves direct re-ranking by up to 2.0% while running 1.3-2.4x faster with an efficient implementation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open Question Answering | Natural Questions (NQ) (test) | Exact Match (EM)48.6 | 134 | |
| Open-domain Question Answering | TriviaQA (test) | Exact Match66.8 | 80 | |
| Passage retrieval | TriviaQA (test) | Top-100 Acc86.1 | 67 | |
| Open-domain Question Answering | WebQuestions (WebQ) (test) | Exact Match (EM)46.9 | 55 | |
| Open-domain Question Answering | CuratedTREC (test) | Exact Match (EM)39.8 | 26 | |
| End-to-end Open-Domain Question Answering | TREC (test) | Exact Match (EM)63.1 | 21 | |
| Passage retrieval | NQ multi-dataset training (test) | Accuracy@2084.2 | 8 | |
| Passage retrieval | EntityQuestions unseen query distribution (test) | Accuracy@200.662 | 8 | |
| Open-domain Question Answering | EntityQuestions (ENTITYQ) (test) | EM28.3 | 7 |