Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

About

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

Teng Wang, Rong Shan, Jianghao Lin, Junjie Wu, Tianyi Xu, Jianping Zhang, Wenteng Chen, Changwang Zhang, Zhaoxiang Wang, Weinan Zhang, Jun Wang• 2026

Related benchmarks

TaskDatasetResultRank
Composed Image Retrieval (Image-Text to Image)CIRR
Recall@151.18
75
Composed Image RetrievalCIRCO
mAP@556.54
63
Composed Image RetrievalFashion-IQ--
40
Composed Image RetrievalFashionIQ (Dress)
Recall@1038.47
20
Composed Image RetrievalFashionIQ Toptee
Recall@1048.24
20
Composed Image RetrievalFashionIQ Shirt
Recall@1044.5
20
Text-to-Image RetrievalGallery 1
NDCG@1056.01
10
Text-to-Image RetrievalGallery2
NDCG@1053.12
10
Text-to-Image RetrievalGallery3
NDCG@1058.7
10
Text-to-Image RetrievalPrivate Industrial Photo Galleries Average
NDCG@1055.94
10
Showing 10 of 10 rows

Other info

Follow for update