Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DetailCLIP: Injecting Image Details into CLIP's Feature Space

About

Although CLIP-like Visual Language Models provide a functional joint feature space for image and text, due to the limitation of the CILP-like model's image input size (e.g., 224), subtle details are lost in the feature representation if we input high-resolution images (e.g., 2240). Our proposed framework addresses this issue by generating a single feature representation for a high-resolution image that retains image details from different scales while sharing the same semantic space as the original CLIP. An application scenario is remote sensing text-image retrieval, where targets (e.g., vehicles and ships) often appear at tiny scales. To achieve this, we develop a feature fusion model that relies on CLIP features extracted from a carefully designed image patch method, dubbed Complete Cover. This method ensures comprehensive coverage of objects across various scales and is weakly supervised by image-agnostic class prompted queries. We evaluate our framework's performance using real-world and synthetic datasets, demonstrating significant improvements in image retrieval tasks based on class prompted queries. To further showcase our framework's capability in detail retrieval, we introduce a CLEVR-like synthetic dataset, named CLVER-DS. This fully annotated dataset offers a controllable object scale, allowing for a more thorough examination of our approach's effectiveness.Our code is publicly available at https://github.com/zilunzhang/DetailCLIP

Zilun Zhang, Cuifeng Shen, Yuan Shen, Xinyu Zhou, Huixin Xiong, Tiancheng Zhao, Jianwei Yin• 2022

Related benchmarks

TaskDatasetResultRank
Image-Text RetrievalCOCO
Recall@162.63
27
Image-Text RetrievalCLEVR-DS
Recall@133.46
12
Image-Text RetrievalUnity
Recall@155.21
12
Image-Text RetrievalLVIS
Recall@115.29
12
Text-to-Image RetrievalCLEVR-DS-S (test)
Recall@114.66
3
Text-to-Image RetrievalCLEVR-DS-L (test)
Recall@116.33
3
Text-to-Image RetrievalCLEVR-DS (test)
Recall@122.54
3
Showing 7 of 7 rows

Other info

Follow for update