Nomic Embed Vision: Expanding the Latent Space

About

This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text. Together, nomic-embed-vision and nomic-embed-text form the first unified latent space to achieve high performance across vision, language, and multimodal tasks.

Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar• 2024

Related benchmarks

Task	Dataset	Result
Handwriting Retrieval	Handwriting In-Domain Set	Accuracy@159.1	30
Handwriting Retrieval	Handwriting Spanish synthetic disjoint fonts (Out-of-Domain (OOD))	Top-1 Accuracy38.82	30
Multimodal-to-text retrieval	MM-BRIGHT	Acad Score22.6	24
Document Retrieval	DocHaystack 200	Recall@113.76	7
Document Retrieval	DocHaystack 100	Recall@116.51	7
Document Retrieval	DocHaystack-1000	Recall@11.83	7
Document Retrieval	InfoHaystack 100	Recall@134.84	7
Document Retrieval	InfoHaystack 200	Recall@130.97	7
Document Retrieval	InfoHaystack-1000	Recall@120.65	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord