BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

About

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

Nandan Thakur, Nils Reimers, Andreas R\"uckl\'e, Abhishek Srivastava, Iryna Gurevych• 2021

Related benchmarks

Task	Dataset	Result
Information Retrieval	BEIR	SciFact0.688	174
Information Retrieval	BEIR (test)	TREC-COVID Score0.757	130
Reranking	MS MARCO (dev)	MRR@100.243	71
Zero-shot Information Retrieval	BEIR	NFCorpus NDCG@10 (Zero-shot)31.9	38
Scientific Document Retrieval	CSFCube (test)	M@1016.99	26
Scientific Document Retrieval	DORIS-MAE (test)	M@1011.52	26
Information Retrieval	BEIR v1.0 (test)	FEVER Score92.8	20
Information Retrieval	MS MARCO in-domain	NDCG@100.408	18
Information Retrieval	MS Marco	NDCG@1040.8	15
Information Retrieval	FiQA 2018 (test)	NDCG@100.347	14

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord