jina-embeddings-v3: Multilingual Embeddings With Task LoRA

About

We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can flexibly reduce the embedding dimensions to as low as 32 without compromising performance, enabled by Matryoshka Representation Learning.

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael G\"unther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao• 2024

Related benchmarks

Task	Dataset	Result
Multilingual Information Retrieval	XQuAD	Completion@1068.74	80
Cross-lingual Information Retrieval	Belebele	Comp@100.8978	80
Long-context Question Answering	LongBench (test)	HotpotQA39.17	69
Sentence Embedding Evaluation	MTEB (test)	Classification Score82.6	55
Book Plot Retrieval	NDP v1	Recall@1045.48	30
Faithfulness Evaluation	LongBench	NAR Score62.5	18
Tabular Classification	TabBench (test)	Overall Score41.48	13
Tabular Retrieval	TabBench (test)	Overall Score41.48	13
Long-text Question Answering	UltraDomain	F1 (bio)32.88	10
Multilingual Retrieval	WSDM Cup 2026 (test)	nDCG@200.355	9

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord