CEDR: Contextualized Embeddings for Document Ranking

About

Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.

Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian• 2019

Related benchmarks

Task	Dataset	Result
Information Retrieval	Robust04	P@2046.67	72
Ad-hoc Document Ranking	WebTrack 2012-14	nDCG@2034.97	18
Document Reranking	Robust04 Description	MAP0.3975	13
Document Reranking	GOV2 Description	MAP33.54	12
Document Reranking	GOV2 Title	MAP34.81	12
Document Reranking	Robust04 Title	MAP37.01	12
Document Reranking	Genomics collection (test)	MAP0.2486	12
Document Retrieval	Robust TREC 2004 (test)	P@2046.7	10

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord