Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

About

Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations by using large-scale contrastive image-text pairs. It shows impressive performance on zero-shot knowledge transfer to downstream tasks. To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification. However, such a process still needs extra training and computational resources. In this paper, we propose \textbf{T}raining-Free CL\textbf{IP}-\textbf{Adapter} (\textbf{Tip-Adapter}), which not only inherits CLIP's training-free advantage but also performs comparably or even better than CLIP-Adapter. Tip-Adapter does not require any back propagation for training the adapter, but creates the weights by a key-value cache model constructed from the few-shot training set. In this non-parametric manner, Tip-Adapter acquires well-performed adapter weights without any training, which is both efficient and effective. Moreover, the performance of Tip-Adapter can be further boosted by fine-tuning such properly initialized adapter for only a few epochs with super-fast convergence speed. We conduct extensive experiments of few-shot classification on ImageNet and other 10 datasets to demonstrate the superiority of proposed Tip-Adapter. The code will be released at \url{https://github.com/gaopengcuhk/Tip-Adapter}.

Renrui Zhang, Rongyao Fang, Wei Zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1k (val)--
1453
Person Re-IdentificationDuke MTMC-reID (test)
Rank-182.6
1018
Image ClassificationImageNet 1k (test)
Top-1 Accuracy68.43
798
Image ClassificationImageNet A
Top-1 Acc49.89
553
Image ClassificationImageNet V2
Top-1 Acc61.88
487
Image ClassificationImageNet-R
Top-1 Acc77.65
474
Image ClassificationImageNet-Sketch
Top-1 Accuracy48.24
360
Image ClassificationImageNet (test)
Top-1 Accuracy65.51
291
Vehicle Re-identificationVeRi-776 (test)
Rank-185.4
232
Image ClassificationFGVC-Aircraft (test)
Accuracy67.4
231
Showing 10 of 26 rows

Other info

Follow for update