Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

About

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world. These languages have diverse typologies, originate from many different language families, and are associated with varying amounts of available resources -- including what researchers typically characterize as high-resource as well as low-resource languages. Our dataset is designed to support the creation and evaluation of models for monolingual retrieval, where the queries and the corpora are in the same language. In total, we have gathered over 700k high-quality relevance judgments for around 77k queries over Wikipedia in these 18 languages, where all assessments have been performed by native speakers hired by our team. Our goal is to spur research that will improve retrieval across a continuum of languages, thus enhancing information access capabilities for diverse populations around the world, particularly those that have been traditionally underserved. This overview paper describes the dataset and baselines that we share with the community. The MIRACL website is live at http://miracl.ai/.

Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin• 2022

Related benchmarks

TaskDatasetResultRank
Monolingual Information RetrievalMr.Tydi Bengali official (test)
MRR@10041.4
5
Monolingual Information RetrievalMr.Tydi Telugu official (test)
MRR@1000.314
5
Information RetrievalMIRACL Bengali (dev)
NDCG@100.546
4
Information RetrievalMIRACL Hindi (dev)
NDCG@1047
4
Information RetrievalMIRACL Telugu (dev)
NDCG@100.462
4
Passage RankingINDIC-MARCO Assamese Small (dev)
MRR@100.095
3
Passage RankingINDIC-MARCO Bengali Small (dev)
MRR@100.159
3
Passage RankingINDIC-MARCO Gujarati Small (dev)
MRR@1014.1
3
Passage RankingINDIC-MARCO Hindi Small (dev)
MRR@1017.1
3
Passage RankingINDIC-MARCO Kannada Small (dev)
MRR@100.156
3
Showing 10 of 16 rows

Other info

Follow for update