Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GENIUS: A Generative Framework for Universal Multimodal Search

About

Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.

Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30k (test)
Recall@174.1
423
Image-to-Text RetrievalMSCOCO--
124
Text-to-Image RetrievalMSCOCO--
118
Text-to-Image RetrievalMS-COCO
R@578
79
Composed Image Retrieval (Image-Text to Image)CIRR--
75
Text-to-Image RetrievalMS-COCO (test)
R@146.1
66
Image-to-Text RetrievalMS-COCO
R@591.1
65
Image-text-to-text retrievalInfoSeek
Recall@520.7
20
Multi-modal retrieval (Text to Text/Image-Text)WebQA
Recall@560.6
19
Composed Image Retrieval (Image-Text to Image)FashionIQ
Recall@1019.2
19
Showing 10 of 21 rows

Other info

Follow for update