Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Target-Guided Composed Image Retrieval

About

Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can retrieve the target image for a multimodal query, including a reference image and its corresponding modification text. Although existing efforts have achieved compelling success, they overlook the conflict relationship modeling between the reference image and the modification text for improving the multimodal query composition and the adaptive matching degree modeling for promoting the ranking of the candidate images that could present different levels of matching degrees with the given query. To address these two limitations, in this work, we propose a Target-Guided Composed Image Retrieval network (TG-CIR). In particular, TG-CIR first extracts the unified global and local attribute features for the reference/target image and the modification text with the contrastive language-image pre-training model (CLIP) as the backbone, where an orthogonal regularization is introduced to promote the independence among the attribute features. Then TG-CIR designs a target-query relationship-guided multimodal query composition module, comprising a target-free student composition branch and a target-based teacher composition branch, where the target-query relationship is injected into the teacher branch for guiding the conflict relationship modeling of the student branch. Last, apart from the conventional batch-based classification loss, TG-CIR additionally introduces a batch-based target similarity-guided matching degree regularization to promote the metric learning process. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed method.

Haokun Wen, Xian Zhang, Xuemeng Song, Yinwei Wei, Liqiang Nie• 2023

Related benchmarks

TaskDatasetResultRank
Composed Image RetrievalCIRR (test)
Recall@145.25
580
Composed Image RetrievalFashionIQ (val)
Average Recall@1051.32
489
Composed Image RetrievalFashion-IQ (test)
Average Recall@100.5132
169
Composed Image Retrieval (Image-Text to Image)CIRR
Recall@578.29
93
Composed Image RetrievalFashion-IQ
Average Recall@5073.09
80
Composed Image RetrievalShoes
R@1063.2
27
Composed Image RetrievalFashionIQ Shirts (test)
Recall@1052.6
12
Composed Image RetrievalFashionIQ Dresses (test)
Recall@1045.22
12
Composed Image RetrievalFashionIQ Tops&Tees (test)
R@1056.14
12
Composed Image RetrievalShoes (val)
R@1063.2
6
Showing 10 of 10 rows

Other info

Code

Follow for update