OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval
About
Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Composed Image Retrieval (Image-Text to Image) | CIRR | Recall@151.18 | 75 | |
| Composed Image Retrieval | CIRCO | mAP@556.54 | 63 | |
| Composed Image Retrieval | Fashion-IQ | -- | 40 | |
| Composed Image Retrieval | FashionIQ (Dress) | Recall@1038.47 | 20 | |
| Composed Image Retrieval | FashionIQ Toptee | Recall@1048.24 | 20 | |
| Composed Image Retrieval | FashionIQ Shirt | Recall@1044.5 | 20 | |
| Text-to-Image Retrieval | Gallery 1 | NDCG@1056.01 | 10 | |
| Text-to-Image Retrieval | Gallery2 | NDCG@1053.12 | 10 | |
| Text-to-Image Retrieval | Gallery3 | NDCG@1058.7 | 10 | |
| Text-to-Image Retrieval | Private Industrial Photo Galleries Average | NDCG@1055.94 | 10 |