Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LPT: Less-overfitting Prompt Tuning for Vision-Language Model

About

Vision-language models (VLMs) have demonstrated exceptional generalization capabilities for downstream tasks. Due to its efficiency, prompt learning has gradually become a more effective and efficient method for transferring VLMs to downstream tasks, surpassing traditional finetuning methods. However, during the transfer process, these models are prone to severe overfitting, leading to a significant decline in generalization ability. To address this issue, we propose a framework named LPT, specifically designed for vision-language models. Specifically, we use CLIP to filter out fine-grained foreground information that may lead to overfitting, thereby guiding the prompts with basic visual concepts. Additionally, to further mitigate overfitting, we have developed a Structural Preservation (SP) constraint at the feature level, which aligns the model's overall feature space structure with the frozen CLIP, endowing the feature space with overall plasticity and enabling effective reshaping of the feature space during optimization. Moreover, we employ Hierarchical Logit (HL) constraint at the output layer to constrain the overall class information in the output, complementing the role of SP at the output end. Extensive experiments across various benchmarks (from base-to-novel, cross-dataset transfer, and domain generalization) demonstrate that our approach significantly improves generalization capability and effectively alleviates overfitting compared to state-of-the-art methods.

Chenhao Ding, Xinyuan Gao, Songlin Dong, Jizhou Han, Qiang Wang, Zhengdong Zhou, Yuhang He, Yihong Gong• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet source to 10 fine-grained target datasets (test)
Caltech101 Accuracy95
37
Image ClassificationFood101 novel classes
Accuracy0.9167
36
Image Classification11 image recognition datasets (Base classes)
Average Accuracy85.1
30
Image ClassificationDTD (Novel)
Top-1 Acc65.4
21
Image ClassificationFlowers102 (Novel)
Top-1 Accuracy77.9
15
Image ClassificationSUN397 (Novel)
Top-1 Acc79.5
15
Image ClassificationUCF101 (Novel)
Top-1 Acc80.5
15
Image ClassificationOxfordPets (Novel)
Top-1 Accuracy97.87
15
Image ClassificationCaltech101 (Novel)
Top-1 Acc94.3
15
Image ClassificationFGVC Aircraft Novel
Accuracy38.8
11
Showing 10 of 13 rows

Other info

Follow for update