A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

About

Recent developments in representational learning for information retrieval can be organized in a conceptual framework that establishes two pairs of contrasts: sparse vs. dense representations and unsupervised vs. learned representations. Sparse learned representations can further be decomposed into expansion and term weighting components. This framework allows us to understand the relationship between recently proposed techniques such as DPR, ANCE, DeepCT, DeepImpact, and COIL, and furthermore, gaps revealed by our analysis point to "low hanging fruit" in terms of techniques that have yet to be explored. We present a novel technique dubbed "uniCOIL", a simple extension of COIL that achieves to our knowledge the current state-of-the-art in sparse retrieval on the popular MS MARCO passage ranking dataset. Our implementation using the Anserini IR toolkit is built on the Lucene search library and thus fully compatible with standard inverted indexes.

Jimmy Lin, Xueguang Ma• 2021

Related benchmarks

Task	Dataset	Result
Document Ranking	TREC DL Track 2019 (test)	nDCG@1064.1	133
Passage retrieval	MsMARCO (dev)	MRR@1035.2	116
Passage Ranking	MS MARCO (dev)	MRR@1034.7	73
Information Retrieval	FIQA BEIR (test)	nDCG@1028.9	44
Information Retrieval	SciFact BEIR (test)	nDCG@1068.6	36
Document Retrieval	MS MARCO MS300K (test)	MRR@1042.5	36
Passage Ranking	TREC DL 2019 (test)	NDCG@1070.3	33
Web Search Retrieval	TREC DL 19	nDCG@1070.2	22
Web Search Retrieval	TREC DL 20	nDCG@1067.5	22
Information Retrieval	DBPedia BEIR (test)	nDCG@1033.8	21

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord