jina-embeddings-v3: Multilingual Embeddings With Task LoRA
About
We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can flexibly reduce the embedding dimensions to as low as 32 without compromising performance, enabled by Matryoshka Representation Learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multilingual Information Retrieval | XQuAD | Completion@1068.74 | 80 | |
| Cross-lingual Information Retrieval | Belebele | Comp@100.8978 | 80 | |
| Long-context Question Answering | LongBench (test) | HotpotQA39.17 | 69 | |
| Sentence Embedding Evaluation | MTEB (test) | Classification Score82.6 | 55 | |
| Faithfulness Evaluation | LongBench | NAR Score62.5 | 18 | |
| Long-text Question Answering | UltraDomain | F1 (bio)32.88 | 10 | |
| Multilingual Retrieval | WSDM Cup 2026 (test) | nDCG@200.355 | 9 | |
| Faithfulness Evaluation | UltraDomain | Bio Score77.78 | 9 | |
| Conciseness Evaluation | UltraDomain | Bio Score66.67 | 9 | |
| Query-to-Item Retrieval | Industrial-scale e-commerce dataset | Recall@101.77 | 9 |