Generative Relevance Feedback with Large Language Models
About
Current query expansion models use pseudo-relevance feedback to improve first-pass retrieval effectiveness; however, this fails when the initial results are not relevant. Instead of building a language model from retrieved results, we propose Generative Relevance Feedback (GRF) that builds probabilistic feedback models from long-form text generated from Large Language Models. We study the effective methods for generating text by varying the zero-shot generation subtasks: queries, entities, facts, news articles, documents, and essays. We evaluate GRF on document retrieval benchmarks covering a diverse set of queries and document collections, and the results show that GRF methods significantly outperform previous PRF methods. Specifically, we improve MAP between 5-19% and NDCG@10 17-24% compared to RM3 expansion, and achieve the best R@1k effectiveness on all datasets compared to state-of-the-art sparse, dense, and expansion models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | BEIR v1.0.0 (test) | -- | 55 | |
| Scientific Document Retrieval | DORIS-MAE (test) | M@1011.34 | 26 | |
| Scientific Document Retrieval | CSFCube (test) | M@1013.89 | 26 | |
| Passage retrieval | TREC DL 2020 (evaluation) | NDCG@100.6143 | 14 | |
| Passage retrieval | TREC DL 2019 (evaluation) | NDCG@100.662 | 14 | |
| Passage retrieval | MS MARCO passage (dev) | NDCG@100.2358 | 14 |