DetailCLIP: Injecting Image Details into CLIP's Feature Space

About

Although CLIP-like Visual Language Models provide a functional joint feature space for image and text, due to the limitation of the CILP-like model's image input size (e.g., 224), subtle details are lost in the feature representation if we input high-resolution images (e.g., 2240). Our proposed framework addresses this issue by generating a single feature representation for a high-resolution image that retains image details from different scales while sharing the same semantic space as the original CLIP. An application scenario is remote sensing text-image retrieval, where targets (e.g., vehicles and ships) often appear at tiny scales. To achieve this, we develop a feature fusion model that relies on CLIP features extracted from a carefully designed image patch method, dubbed Complete Cover. This method ensures comprehensive coverage of objects across various scales and is weakly supervised by image-agnostic class prompted queries. We evaluate our framework's performance using real-world and synthetic datasets, demonstrating significant improvements in image retrieval tasks based on class prompted queries. To further showcase our framework's capability in detail retrieval, we introduce a CLEVR-like synthetic dataset, named CLVER-DS. This fully annotated dataset offers a controllable object scale, allowing for a more thorough examination of our approach's effectiveness.Our code is publicly available at https://github.com/zilunzhang/DetailCLIP

Zilun Zhang, Cuifeng Shen, Yuan Shen, Xinyu Zhou, Huixin Xiong, Tiancheng Zhao, Jianwei Yin• 2022

Related benchmarks

Task	Dataset	Result
Image-Text Retrieval	COCO	Recall@162.63	27
Image-Text Retrieval	CLEVR-DS	Recall@133.46	12
Image-Text Retrieval	Unity	Recall@155.21	12
Image-Text Retrieval	LVIS	Recall@115.29	12
Text-to-Image Retrieval	CLEVR-DS-S (test)	Recall@114.66	3
Text-to-Image Retrieval	CLEVR-DS-L (test)	Recall@116.33	3
Text-to-Image Retrieval	CLEVR-DS (test)	Recall@122.54	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord