Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

About

Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, offers in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Text Embedding Benchmark (MTEB). Furthermore, to increase the model's awareness of grammatical negation, we construct a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.

Michael G\"unther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao• 2023

Related benchmarks

TaskDatasetResultRank
Text EmbeddingMTEB English v2
Mean Score59.76
50
Semantic Textual Similarity (STS)MTEB English 2023 (test)
BIO84.43
19
Module-level Code LocalizationSWE-Bench Lite
Acc@563.5
16
Function-level Code LocalizationSWE-Bench Lite
Acc@542.34
16
Function-level LocalizationSWE-Bench-Lite latest (test)
NDCG@533.28
16
Module-level LocalizationSWE-Bench-Lite latest (test)
NDCG@551.02
16
File-level Code LocalizationSWE-Bench Lite
Acc@143.43
16
File-level LocalizationSWE-Bench-Lite latest (test)
NDCG@143.43
16
Showing 8 of 8 rows

Other info

Follow for update