Nomic Embed Vision: Expanding the Latent Space
About
This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text. Together, nomic-embed-vision and nomic-embed-text form the first unified latent space to achieve high performance across vision, language, and multimodal tasks.
Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Handwriting Retrieval | Handwriting In-Domain Set | Accuracy@159.1 | 30 | |
| Handwriting Retrieval | Handwriting Spanish synthetic disjoint fonts (Out-of-Domain (OOD)) | Top-1 Accuracy38.82 | 30 | |
| Document Retrieval | DocHaystack 200 | Recall@113.76 | 7 | |
| Document Retrieval | DocHaystack 100 | Recall@116.51 | 7 | |
| Document Retrieval | DocHaystack-1000 | Recall@11.83 | 7 | |
| Document Retrieval | InfoHaystack 100 | Recall@134.84 | 7 | |
| Document Retrieval | InfoHaystack 200 | Recall@130.97 | 7 | |
| Document Retrieval | InfoHaystack-1000 | Recall@120.65 | 7 |
Showing 8 of 8 rows