RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

About

Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma• 2020

Related benchmarks

Task	Dataset	Result
Information Retrieval	NQ320k	Hits@122.57	54
Document Retrieval	MS MARCO MS300K (test)	MRR@1038.48	36
Information Retrieval	ClueWeb 500K	nDCG@526.24	21
Information Retrieval	Gov 500K	nDCG@50.3101	21
Document Retrieval	NQ (test)	Hits@150.2	18
Passage retrieval	MS MARCO Passage Ranking 7k queries (dev)	MRR@1030.4	11

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord