Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multilingual E5 Text Embeddings: A Technical Report

About

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei• 2024

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)--
90
Multilingual Information RetrievalXQuAD
Completion@1058.99
80
Cross-lingual Information RetrievalBelebele
Comp@100.6867
80
Text EmbeddingMTEB English v2
Mean Score65.5
68
Information RetrievalBEIR v1.0.0 (test)
ArguAna54.4
65
Information RetrievalBEIR
Average NDCG@100.533
62
Multi-lingual retrievalMIRACL (dev)
Avg Score66.6
43
Information RetrievalFIQA BEIR (test)
nDCG@1048.4
32
Multilingual Text EmbeddingMTEB Multilingual
Mean Score (Task)63.2
29
Multilingual RetrievalMTEB Multilingual v2
nDCG@1057.1
28
Showing 10 of 58 rows

Other info

Follow for update