Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Precise Zero-Shot Dense Retrieval without Relevance Labels

About

While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings~(HyDE). Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder~(e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity. This second step ground the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the incorrect details. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers, across various tasks (e.g. web search, QA, fact verification) and languages~(e.g. sw, ko, ja).

Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan• 2022

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM45
387
Multi-hop Question AnsweringMuSiQue
EM21
185
Multi-hop Question Answering2WikiMQA--
161
Multi-hop Question Answering2Wiki
Exact Match23.5
152
Document RankingTREC DL Track 2019 (test)
nDCG@1061.3
133
Information RetrievalBEIR (test)
FiQA-2018 Score27.3
90
Multi-hop Question AnsweringMulti-hop RAG
F132.1
77
Question AnsweringNQ
EM44.5
69
Document RankingTREC DL Track 2020 (test)
nDCG@100.579
63
Medical Question AnsweringMMLU Med
Accuracy64.2
61
Showing 10 of 84 rows
...

Other info

Code

Follow for update