Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

About

Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with limited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP. Typically, GDA assumes that features of each class follow Gaussian distributions with identical covariance. By leveraging Bayes' formula, the classifier can be expressed in terms of the class means and covariance, which can be estimated from the data without the need for training. To integrate knowledge from both visual and textual modalities, we ensemble it with the original zero-shot classifier within CLIP. Extensive results on 17 datasets validate that our method surpasses or achieves comparable results with state-of-the-art methods on few-shot classification, imbalanced learning, and out-of-distribution generalization. In addition, we extend our method to base-to-new generalization and unsupervised learning, once again demonstrating its superiority over competing approaches. Our code is publicly available at \url{https://github.com/mrflogs/ICLR24}.

Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationStanford Cars
Accuracy74.8
635
Image ClassificationEuroSAT
Accuracy85.5
569
Image ClassificationFlowers102
Accuracy95.8
558
Image ClassificationFood101
Accuracy79
457
Image ClassificationSUN397
Accuracy70.7
441
Fine-grained visual classificationFGVC-Aircraft (test)
Top-1 Acc18.69
312
Image ClassificationOxford-IIIT Pets
Accuracy89.1
306
Image ClassificationCaltech101
Accuracy92.2
228
Image ClassificationFGVC Aircraft--
203
Image ClassificationDTD (Describable Textures Dataset)
Accuracy66.1
57
Showing 10 of 17 rows

Other info

Follow for update