AttriPrompt: Dynamic Prompt Composition Learning for CLIP

About

The evolution of prompt learning methodologies has driven exploration of deeper prompt designs to enhance model performance. However, current deep text prompting approaches suffer from two critical limitations: Over-reliance on constrastive learning objectives that prioritize high-level semantic alignment, neglecting fine-grained feature optimization; Static prompts across all input categories, preventing content-aware adaptation. To address these limitations, we propose AttriPrompt-a novel framework that enhances and refines textual semantic representations by leveraging the intermediate-layer features of CLIP's vision encoder. We designed an Attribute Retrieval module that first clusters visual features from each layer. The aggregated visual features retrieve semantically similar prompts from a prompt pool, which are then concatenated to the input of every layer in the text encoder. Leveraging hierarchical visual information embedded in prompted text features, we introduce Dual-stream Contrastive Learning to realize fine-grained alignment. Furthermore, we introduce a Self-Regularization mechanism by applying explicit regularization constraints between the prompted and non-prompted text features to prevent overfitting on limited training data. Extensive experiments across three benchmarks demonstrate AttriPrompt's superiority over state-of-the-art methods, achieving up to 7.37\% improvement in the base-to-novel setting. The observed strength of our method in cross-domain knowledge transfer positions vision-language pre-trained models as more viable solutions for real-world implementation.

Qiqi Zhan, Shiwei Li, Qingjie Liu, Yunhong Wang• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	Flowers102	Accuracy71.57	558
Image Classification	UCF101	Top-1 Acc69.17	529
Image Classification	DTD	Accuracy48.37	487
Image Classification	Food101	Accuracy86.47	457
Image Classification	SUN397	Accuracy68	450
Image Classification	StanfordCars	Accuracy65.63	384
Image Classification	Aircraft	Accuracy24.37	340
Image Classification	OxfordPets	Accuracy90.73	298
Image Classification	Caltech101	Accuracy94.23	228
Image Classification	EuroSAT	Accuracy53.17	226

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord