Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

About

Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of sentence/passage-length. This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages. When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval. When applied to query text, DeepCT-Query generates a weighted bag-of-words query. Both types of term weight can be used directly by typical first-stage retrieval algorithms. This is novel because most deep neural network based ranking models have higher computational costs, and thus are restricted to later-stage rankers. Experiments on four datasets demonstrate that DeepCT's deep contextualized text understanding greatly improves the accuracy of first-stage retrieval algorithms.

Zhuyun Dai, Jamie Callan• 2019

Related benchmarks

Task	Dataset	Result
Passage retrieval	MsMARCO (dev)	MRR@1024.3	116
Retrieval	MS MARCO (dev)	MRR@100.243	84
Retrieval	TREC DL 2019	NDCG@1055.4	83
Passage Ranking	MS MARCO (dev)	MRR@1024.3	73

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord