Read-only Prompt Optimization for Vision-Language Few-shot Learning
About
In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization while displaying better robustness. Also, the proposed method achieves better generalization on extremely data-deficient settings, while improving parameter efficiency and computational overhead. Code is available at https://github.com/mlvlab/RPO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Flowers102 | -- | 478 | |
| Image Classification | ImageNet | Top-1 Accuracy71.67 | 324 | |
| Image Classification | Food101 | -- | 309 | |
| Image Classification | StanfordCars | -- | 266 | |
| Image Classification | FGVC-Aircraft (test) | Accuracy37.33 | 231 | |
| Image Classification | FGVCAircraft | -- | 225 | |
| Image Classification | SUN397 | Accuracy (Base)80.6 | 131 | |
| Image Classification | Caltech101 | Base Accuracy97.97 | 129 | |
| Image Classification | Caltech101 (test) | -- | 121 | |
| Image Classification | OxfordPets | Base Accuracy94.63 | 117 |