Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction

About

Composed image retrieval (CIR) searches a corpus with a reference image and a text describing how to modify it. Despite rapid progress from triplet-trained compositors to zero-shot and generative methods, essentially all systems share one assumption: that a query maps to a single target, scored by Recall@K against one annotation. We argue this is fundamentally at odds with the task. A query such as make it more formal does not name an image but a region of the corpus, and which member the user intends is genuinely underdetermined. This underspecification is the root of the well-known false-negative problem and leaves current models unable to tell a precise query from an ambiguous one. We reframe CIR as calibrated intent resolution under uncertainty: a retriever is wrapped in a conformal prediction layer that returns a candidate set with a coverage guarantee and whose size is a principled measure of ambiguity; when the set is large, an expected-information-gain policy asks the single most useful clarifying question, drawn from interpretable ambiguity axes, and the set contracts. We introduce AmbiCIR, a benchmark and human-validated user simulator that revive the dormant auxiliary and dialogue annotations of CIRR and extend the multiple-positive setting of CIRCO. Across open-domain and fashion benchmarks our method matches single-turn state of the art, confirming calibrated resolution is cost-free on precise queries, while reaching the intended target in a fraction of the interaction budget required by naive conversational baselines, and it is the first to report valid coverage and calibration for the task.

Amsisan Tran, Baogh Le, Tuan Kiet Pham, Sui Yang Guang• 2026

Related benchmarks

Task	Dataset	Result
Composed Image Retrieval	CIRR (test)	Recall@130.1	887
Composed Image Retrieval	CIRCO (test)	mAP@1024.8	432
Composed Image Retrieval	Fashion-IQ (test)	Average Recall@100.385	200
Composed Image Retrieval (Image-Text to Image)	CIRR	Recall@191.3	168
Composed Image Retrieval	CIRCO	mAP@536	122
Composed Image Retrieval	CIRR Subset (test)	R@158.4	40
Composed Image Retrieval (Image-Text to Image)	FashionIQ	Recall@1054.6	39

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord