Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Nomic Embed Vision: Expanding the Latent Space

About

This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text. Together, nomic-embed-vision and nomic-embed-text form the first unified latent space to achieve high performance across vision, language, and multimodal tasks.

Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar• 2024

Related benchmarks

TaskDatasetResultRank
Handwriting RetrievalHandwriting In-Domain Set
Accuracy@159.1
30
Handwriting RetrievalHandwriting Spanish synthetic disjoint fonts (Out-of-Domain (OOD))
Top-1 Accuracy38.82
30
Document RetrievalDocHaystack 200
Recall@113.76
7
Document RetrievalDocHaystack 100
Recall@116.51
7
Document RetrievalDocHaystack-1000
Recall@11.83
7
Document RetrievalInfoHaystack 100
Recall@134.84
7
Document RetrievalInfoHaystack 200
Recall@130.97
7
Document RetrievalInfoHaystack-1000
Recall@120.65
7
Showing 8 of 8 rows

Other info

Follow for update