Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

About

We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can flexibly reduce the embedding dimensions to as low as 32 without compromising performance, enabled by Matryoshka Representation Learning.

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael G\"unther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao• 2024

Related benchmarks

TaskDatasetResultRank
Multilingual Information RetrievalXQuAD
Completion@1068.74
80
Cross-lingual Information RetrievalBelebele
Comp@100.8978
80
Long-context Question AnsweringLongBench (test)
HotpotQA39.17
69
Sentence Embedding EvaluationMTEB (test)
Classification Score82.6
55
Faithfulness EvaluationLongBench
NAR Score62.5
18
Long-text Question AnsweringUltraDomain
F1 (bio)32.88
10
Multilingual RetrievalWSDM Cup 2026 (test)
nDCG@200.355
9
Faithfulness EvaluationUltraDomain
Bio Score77.78
9
Conciseness EvaluationUltraDomain
Bio Score66.67
9
Query-to-Item RetrievalIndustrial-scale e-commerce dataset
Recall@101.77
9
Showing 10 of 10 rows

Other info

Follow for update