Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs

About

Multimodal Large Language Models (MLLMs) possess intrinsic reasoning and world-knowledge capabilities, yet adapting them for dense retrieval remains challenging. Existing approaches rely on invasive parameter updates, such as full fine-tuning and LoRA, which may disrupt the pre-trained semantic space and impair the structured knowledge essential for reasoning. To address this, we propose SLQ, a parameter-efficient tuning framework that adapts MLLMs for retrieval while keeping the backbone entirely frozen. SLQ introduces a small set of Shared Latent Queries that are appended to both text and image tokens, leveraging the model's native causal attention to aggregate multimodal context into a unified embedding space. Furthermore, to better evaluate retrieval beyond superficial pattern matching, we construct KARR-Bench, a benchmark designed for knowledge-aware reasoning retrieval. Extensive experiments show that SLQ outperforms full fine-tuning and LoRA on COCO and Flickr30K, while achieving competitive performance on MMEB and yielding substantial gains on KARR-Bench, validating that preserving the pre-trained representations via non-invasive adaptation is an effective strategy for MLLM-based retrieval. The code is available under: https://github.com/CnFaker/SLQ.

Haoran Lou, Ziyan Liu, Chunxiao Fan, Yuexin Wu, Yue Ming, Hao Wu, Kai Zuo, Yibo Chen, Xu Tang• 2026

Related benchmarks

TaskDatasetResultRank
Image-to-Text RetrievalFlickr30K 1K (test)
R@192
491
Text-to-Image RetrievalFlickr30K 1K (test)
R@181.8
432
Composed Image RetrievalFashion-IQ
Average Recall@5043.1
129
Composed Image Retrieval (Image-Text to Image)CIRR--
128
Image-to-Text RetrievalCOCO 5K (test)
R@169.6
57
Text-to-Image RetrievalCOCO 5K (test)
R@155.4
53
Image RetrievalFlickr30k (1K)
R@180.9
21
Multimodal RetrievalMMEB v1 (test)
Classification60.9
18
Image RetrievalCOCO I2I
R@155.4
7
Text-rendered-as-image RetrievalFlickr30K I2I
R@190.2
7
Showing 10 of 13 rows

Other info

Follow for update