Improving Passage Retrieval with Zero-Shot Question Generation
About
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | 2Wiki | F116.8 | 75 | |
| Ranking | BEIR selected subset v1.0.0 (test) | TREC-COVID69.25 | 38 | |
| Reranking | BEIR | NQ NDCG@50.3486 | 35 | |
| Reranking | TREC | NDCG@5 (DL19)65.77 | 35 | |
| Passage Ranking | NQ | MRR29.53 | 29 | |
| Passage Ranking | WebQuestions (WQ) | R@1054.8 | 28 | |
| Passage retrieval | Natural Questions (NQ) | Top-10 Accuracy53.51 | 28 | |
| Passage Ranking | TREC DL 2019 | R@1083.33 | 28 | |
| Passage Ranking | TREC DL 2020 | R@1077.27 | 28 | |
| Pointwise Ranking | TREC DL 2020 (test) | nDCG@100.4287 | 19 |