Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Granite Embedding Models

About

We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite.

Parul Awasthy, Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox, Abraham Daniels, Martin Franz, Gabe Goodhart, Bhavani Iyer, Vishwajeet Kumar, Luis Lastras, Scott McCarley, Rudra Murthy, Vignesh P, Sara Rosenthal, Salim Roukos, Jaydeep Sen, Sukriti Sharma, Avirup Sil, Kate Soule, Arafat Sultan, Radu Florian• 2025

Related benchmarks

TaskDatasetResultRank
Text EmbeddingMTEB English v2
Mean Score62.08
107
Text ClassificationN24News (test)
Macro F155.38
52
Multilingual RetrievalMTEB Multilingual v2
nDCG@1052.2
40
Multi-hop QA RetrievalMuSiQue
R@237.9
36
RetrievalMTEB eng v2
nDCG@1051.5
31
Information RetrievalLongEmbed
NDCG@1037.7
26
Code RetrievalMTEB Code
nDCG@1048.5
21
Disease predictionHaodf Lung
Hit Rate @ 150.95
16
Disease predictionHaodf Coronary Heart Disease
Hit Rate @ 116.73
16
Disease predictionHaodf Pneumonia
Hit@112.31
16
Showing 10 of 20 rows

Other info

Follow for update