Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

About

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks. To improve its capacity on downstream tasks, few-shot learning has become a widely-adopted technique. However, existing methods either exhibit limited performance or suffer from excessive learnable parameters. In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.

Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet 1k (test)
Top-1 Accuracy68.74
848
Image ClassificationImageNet V2--
611
Image ClassificationOxford-IIIT Pet
Accuracy93.46
219
Image ClassificationImageNet V2 (test)
Top-1 Accuracy59.58
216
Image ClassificationImageNet-Sketch (test)
Top-1 Acc0.4328
153
Image ClassificationAverage 11 datasets--
83
Image ClassificationImageNet
Top-1 Accuracy74.13
80
Image ClassificationImageNet V2 (Target)
Accuracy55.94
48
Few-shot Image ClassificationAverage 11 datasets (test)
Average Accuracy (Few-shot)77.18
47
Image ClassificationImageNet (source)
Accuracy63.42
37
Showing 10 of 14 rows

Other info

Follow for update