Dense Passage Retrieval for Open-Domain Question Answering

About

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

Vladimir Karpukhin, Barlas O\u{g}uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih• 2020

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score44.69	294
Question Answering	2Wiki	EM39.9	241
Question Answering	HotpotQA	EM52	173
Multi-hop QA	HotpotQA	Exact Match18.3	143
Open Question Answering	Natural Questions (NQ) (test)	Exact Match (EM)44.6	134
Question Answering	NQ (test)	EM Accuracy36.09	133
Document Ranking	TREC DL Track 2019 (test)	nDCG@1062.2	133
Question Answering	QASPER (test)	F1 Score (Match)51.3	132
Information Retrieval	BEIR (test)	TREC-COVID Score33.2	126
Question Answering	PopQA (test)	Accuracy77.2	111

Showing 10 of 300 rows

...

Other info

Code

Follow for update

@wizwand_team Discord