Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction

About

Composed image retrieval (CIR) searches a corpus with a reference image and a text describing how to modify it. Despite rapid progress from triplet-trained compositors to zero-shot and generative methods, essentially all systems share one assumption: that a query maps to a single target, scored by Recall@K against one annotation. We argue this is fundamentally at odds with the task. A query such as make it more formal does not name an image but a region of the corpus, and which member the user intends is genuinely underdetermined. This underspecification is the root of the well-known false-negative problem and leaves current models unable to tell a precise query from an ambiguous one. We reframe CIR as calibrated intent resolution under uncertainty: a retriever is wrapped in a conformal prediction layer that returns a candidate set with a coverage guarantee and whose size is a principled measure of ambiguity; when the set is large, an expected-information-gain policy asks the single most useful clarifying question, drawn from interpretable ambiguity axes, and the set contracts. We introduce AmbiCIR, a benchmark and human-validated user simulator that revive the dormant auxiliary and dialogue annotations of CIRR and extend the multiple-positive setting of CIRCO. Across open-domain and fashion benchmarks our method matches single-turn state of the art, confirming calibrated resolution is cost-free on precise queries, while reaching the intended target in a fraction of the interaction budget required by naive conversational baselines, and it is the first to report valid coverage and calibration for the task.

Amsisan Tran, Baogh Le, Tuan Kiet Pham, Sui Yang Guang• 2026

Related benchmarks

TaskDatasetResultRank
Composed Image RetrievalCIRR (test)
Recall@130.1
786
Composed Image RetrievalCIRCO (test)
mAP@1024.8
360
Composed Image RetrievalFashion-IQ (test)
Average Recall@100.385
176
Composed Image Retrieval (Image-Text to Image)CIRR
Recall@191.3
128
Composed Image RetrievalCIRCO
mAP@536
96
Composed Image Retrieval (Image-Text to Image)FashionIQ
Recall@1054.6
39
Composed Image RetrievalCIRR Subset (test)
R@158.4
33
Showing 7 of 7 rows

Other info

Follow for update