XR: Cross-Modal Agents for Composed Image Retrieval

About

Retrieval is being redefined by agentic AI, demanding multimodal reasoning beyond conventional similarity-based paradigms. Composed Image Retrieval (CIR) exemplifies this shift as each query combines a reference image with textual modifications, requiring compositional understanding across modalities. While embedding-based CIR methods have achieved progress, they remain narrow in perspective, capturing limited cross-modal cues and lacking semantic reasoning. To address these limitations, we introduce XR, a training-free multi-agent framework that reframes retrieval as a progressively coordinated reasoning process. It orchestrates three specialized types of agents: imagination agents synthesize target representations through cross-modal generation, similarity agents perform coarse filtering via hybrid matching, and question agents verify factual consistency through targeted reasoning for fine filtering. Through progressive multi-agent coordination, XR iteratively refines retrieval to meet both semantic and visual query constraints, achieving up to a 38% gain over strong training-free and training-based baselines on FashionIQ, CIRR, and CIRCO, while ablations show each agent is essential. Code is available: https://01yzzyu.github.io/xr.github.io/.

Zhongyu Yang, Wei Pang, Yingfang Yuan• 2026

Related benchmarks

Task	Dataset	Result
Composed Image Retrieval	CIRR (test)	Recall@143.13	786
Composed Image Retrieval	FashionIQ (val)	Average Recall@1037.18	601
Composed Image Retrieval	CIRCO (test)	mAP@1032.88	360
Composed Image Retrieval	Fashion-IQ	--	129
Composed Image Retrieval (Image-Text to Image)	CIRR	Recall@143.13	128
Composed Image Retrieval	CIRCO	mAP@531.38	96
Composed Image Retrieval	FashionIQ Shirt	Recall@1038.91	64
Composed Image Retrieval	FashionIQ (Dress)	Recall@1028.71	39
Composed Image Retrieval	FashionIQ Toptee	Recall@1043.91	27
Compositional Image Retrieval	FashionIQ (val)	Recall@10 (Shirt)38.91	23

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord