Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
About
Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of sentence/passage-length. This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages. When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval. When applied to query text, DeepCT-Query generates a weighted bag-of-words query. Both types of term weight can be used directly by typical first-stage retrieval algorithms. This is novel because most deep neural network based ranking models have higher computational costs, and thus are restricted to later-stage rankers. Experiments on four datasets demonstrate that DeepCT's deep contextualized text understanding greatly improves the accuracy of first-stage retrieval algorithms.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Passage retrieval | MsMARCO (dev) | MRR@1024.3 | 116 | |
| Retrieval | MS MARCO (dev) | MRR@100.243 | 84 | |
| Passage Ranking | MS MARCO (dev) | MRR@1024.3 | 73 | |
| Retrieval | TREC DL 2019 | NDCG@1055.4 | 71 |