LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
About
Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods, achieving up to 6 times faster training times, utilizing 18 times fewer visual prompt parameters, and delivering a 3.1% improvement in performance. The code is available as https://github.com/jincan333/LoR-VP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 | Accuracy97.52 | 875 | |
| Image Classification | ImageNet V2 | -- | 749 | |
| Image Classification | Tiny ImageNet (test) | Accuracy89.78 | 722 | |
| Image Classification | ImageNet A | Top-1 Acc19.96 | 698 | |
| Image Classification | CIFAR-100 (test) | -- | 395 | |
| Image Classification | CIFAR-100 | Accuracy88.06 | 357 | |
| Image Classification | Tiny-ImageNet | Accuracy (%)85.77 | 131 | |
| Image-to-Text Retrieval | Flickr30k (val) | Recall@190.3 | 70 | |
| Text-to-Image Retrieval | Flickr30k (val) | R@173.46 | 70 | |
| Text-to-Image Retrieval | COCO 2017 | Recall@568.17 | 43 |