LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

About

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.

Canwen Xu, Daya Guo, Nan Duan, Julian McAuley• 2022

Related benchmarks

Task	Dataset	Result
Information Retrieval	BEIR	SciFact0.499	120
Retrieval	MS MARCO (dev)	MRR@100.3191	84
Dense Retrieval	BEIR zero-shot	TREC-COVID47.8	13
Dense Retrieval	Natural Question (test)	Recall@1073.77	9
Information Retrieval	Natural Question	Recall@1078.01	9

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord