Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries

About

Image-Guided Retrieval with Optional Text (IGROT) is a general retrieval setting where a query consists of an anchor image, with or without accompanying text, aiming to retrieve semantically relevant target images. This formulation unifies two major tasks: Composed Image Retrieval (CIR) and Sketch-Based Image Retrieval (SBIR). In this work, we address IGROT under low-data supervision by introducing UNION, a lightweight and generalisable target representation that fuses the image embedding with a null-text prompt. Unlike traditional approaches that rely on fixed target features, UNION enhances semantic alignment with multimodal queries while requiring no architectural modifications to pretrained vision-language models. With only 5,000 training samples - from LlavaSCo for CIR and Training-Sketchy for SBIR - our method achieves competitive results across benchmarks, including CIRCO mAP@50 of 38.5 and Sketchy mAP@200 of 82.7, surpassing many heavily supervised baselines. This demonstrates the robustness and efficiency of UNION in bridging vision and language across diverse query types.

Hoang-Bao Le, Allie Tran, Binh T. Nguyen, Liting Zhou, Cathal Gurrin• 2025

Related benchmarks

TaskDatasetResultRank
Composed Image Retrieval (Image-Text to Image)CIRR--
75
Composed Image RetrievalCIRCO--
63
Composed Image RetrievalFashion-IQ
Average Recall@1034.4
40
Sketch-based image retrievalSketchy
mAP@20082.7
15
Sketch-based image retrievalQuickDraw
mAP33.4
15
Sketch-based image retrievalTU-Berlin
mAP51
15
Showing 6 of 6 rows

Other info

Follow for update