Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AttriPrompt: Dynamic Prompt Composition Learning for CLIP

About

The evolution of prompt learning methodologies has driven exploration of deeper prompt designs to enhance model performance. However, current deep text prompting approaches suffer from two critical limitations: Over-reliance on constrastive learning objectives that prioritize high-level semantic alignment, neglecting fine-grained feature optimization; Static prompts across all input categories, preventing content-aware adaptation. To address these limitations, we propose AttriPrompt-a novel framework that enhances and refines textual semantic representations by leveraging the intermediate-layer features of CLIP's vision encoder. We designed an Attribute Retrieval module that first clusters visual features from each layer. The aggregated visual features retrieve semantically similar prompts from a prompt pool, which are then concatenated to the input of every layer in the text encoder. Leveraging hierarchical visual information embedded in prompted text features, we introduce Dual-stream Contrastive Learning to realize fine-grained alignment. Furthermore, we introduce a Self-Regularization mechanism by applying explicit regularization constraints between the prompted and non-prompted text features to prevent overfitting on limited training data. Extensive experiments across three benchmarks demonstrate AttriPrompt's superiority over state-of-the-art methods, achieving up to 7.37\% improvement in the base-to-novel setting. The observed strength of our method in cross-domain knowledge transfer positions vision-language pre-trained models as more viable solutions for real-world implementation.

Qiqi Zhan, Shiwei Li, Qingjie Liu, Yunhong Wang• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102
Accuracy71.57
478
Image ClassificationDTD
Accuracy48.37
419
Image ClassificationUCF101
Top-1 Acc69.17
404
Image ClassificationFood101
Accuracy86.47
309
Image ClassificationAircraft
Accuracy24.37
302
Image ClassificationStanfordCars
Accuracy65.63
266
Image ClassificationSUN397
Accuracy68
246
Image ClassificationCaltech101
Accuracy94.23
162
Image ClassificationSUN397
Accuracy (Base)82.77
131
Image ClassificationOxfordPets
Base Accuracy96.13
117
Showing 10 of 21 rows

Other info

Follow for update