jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
About
We introduce jina-embeddings-v4, a 3.8 billion parameter multimodal embedding model that unifies text and image representations through a novel architecture supporting both single-vector and multi-vector embeddings in the late interaction style. The model incorporates task-specific Low-Rank Adaptation (LoRA) adapters to optimize performance across diverse retrieval scenarios, including query-document retrieval, semantic text similarity, and code search. Comprehensive evaluations demonstrate that jina-embeddings-v4 achieves state-of-the-art performance on both single-modal and cross-modal retrieval tasks, with particular strength in processing visually rich content such as tables, charts, diagrams, and mixed-media formats. To facilitate evaluation of this capability, we also introduce Jina-VDR, a novel benchmark specifically designed for visually rich image retrieval.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | BEIR | -- | 59 | |
| Text Embedding | MTEB English v2 | Mean Score65.09 | 50 | |
| Multilingual Text Embedding | MTEB Multilingual | Mean Score (Task)58.17 | 29 | |
| Visual document retrieval | ViDoRe V3 | HR59.53 | 23 | |
| Retrieval | MTEB-E English v2 | MTEB-E Retrieval Score56.15 | 16 | |
| Multilingual Retrieval | MTEB Multilingual v2 | MTEB-M Score66.43 | 11 | |
| Retrieval | RTEB Multilingual Public | RTEB66.52 | 11 | |
| Retrieval | LongEmbed | Long Task Score69.88 | 11 | |
| Document Retrieval | ViDoRe V2 | NDCG@50.576 | 10 | |
| Document Retrieval | Nayana-IR Cross-Lingual | NDCG@543.5 | 10 |