Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust Adaptation of Foundation Models with Black-Box Visual Prompting

About

With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts, which allow the target PTM to adapt in the wild. SPSA-GC efficiently estimates the gradient of PTM to update Coordinator. Besides, we introduce a variant, BlackVIP-SE, which significantly reduces the runtime and computational cost of BlackVIP. Extensive experiments on 19 datasets demonstrate that BlackVIPs enable robust adaptation to diverse domains and tasks with minimal memory requirements. We further provide a theoretical analysis on the generalization of visual prompting methods by presenting their connection to the certified robustness of randomized smoothing, and presenting an empirical support for improved robustness.

Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationStanford Cars--
635
Image ClassificationEuroSAT--
569
Image ClassificationFood-101--
542
Image ClassificationUCF101
Top-1 Acc69.1
455
Image ClassificationSUN397--
441
Image ClassificationSVHN (test)
Accuracy61.8
401
Image ClassificationRESISC45--
349
Image ClassificationOxford-IIIT Pets (test)
Mean Accuracy91.4
172
Image ClassificationFGVC Aircraft
Top-1 Acc25.4
92
Image ClassificationOxford 102 Flowers
Top-1 Accuracy71.5
74
Showing 10 of 22 rows

Other info

Follow for update