Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual prompting reimagined: The power of the Activation Prompts

About

Visual prompting (VP) has emerged as a popular method to repurpose pretrained vision models for adaptation to downstream tasks. Unlike conventional model fine-tuning techniques, VP introduces a universal perturbation directly into the input data to facilitate task-specific fine-tuning rather than modifying model parameters. However, there exists a noticeable performance gap between VP and conventional fine-tuning methods, highlighting an unexplored realm in theory and practice to understand and advance the input-level VP to reduce its current performance gap. Towards this end, we introduce a generalized concept, termed activation prompt (AP), which extends the scope of the input-level VP by enabling universal perturbations to be applied to activation maps within the intermediate layers of the model. By using AP to revisit the problem of VP and employing it as an analytical tool, we demonstrate the intrinsic limitations of VP in both performance and efficiency, revealing why input-level prompting may lack effectiveness compared to AP, which exhibits a model-dependent layer preference. We show that AP is closely related to normalization tuning in convolutional neural networks and vision transformers, although each model type has distinct layer preferences for prompting. We also theoretically elucidate the rationale behind such a preference by analyzing global features across layers. Through extensive experiments across 29 datasets and various model architectures, we provide a comprehensive performance analysis of AP, comparing it with VP and parameter-efficient fine-tuning baselines. Our results demonstrate AP's superiority in both accuracy and efficiency, considering factors such as time, parameters, memory usage, and throughput.

Yihua Zhang, Hongkang Li, Yuguang Yao, Aochuan Chen, Shuai Zhang, Pin-Yu Chen, Meng Wang, Sijia Liu• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT (test)
Accuracy96.45
141
Image ClassificationOxford Pets (test)
Accuracy83.82
125
Image ClassificationFlowers102 (test)
Accuracy85.52
119
Image ClassificationWaterbirds (test)--
112
Image ClassificationUCF-101 (test)
Accuracy76.42
106
Image ClassificationVTAB
Overall Accuracy90.25
103
Image ClassificationFood101 (test)
Accuracy82.43
91
Image ClassificationFGVC
Accuracy85.3
68
Image ClassificationFGVC
CUB Accuracy86.74
38
Image ClassificationDTD (test)
Accuracy (DTD Test)69.42
28
Showing 10 of 14 rows

Other info

Follow for update