Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spotlighter: Revisiting Prompt Tuning from a Representative Mining View

About

CLIP's success has demonstrated that prompt tuning can achieve robust cross-modal semantic alignment for tasks ranging from open-domain recognition to fine-grained classification. However, redundant or weakly relevant feature components introduce noise and incur unnecessary computational costs. In this work, we propose Spotlighter, a lightweight token-selection framework that simultaneously enhances accuracy and efficiency in prompt tuning. Spotlighter evaluates each visual token's activation from both sample-wise and semantic-wise perspectives and retains only the top-scoring tokens for downstream prediction. A class-specific semantic memory bank of learned prototypes refines this selection, ensuring semantic representativeness and compensating for discarded features. To further prioritize informative signals, we introduce a two-level ranking mechanism that dynamically weights token--prototype interactions. Across 11 few-shot benchmarks, Spotlighter outperforms CLIP by up to 11.19\% in harmonic mean accuracy and achieves up to 0.8K additional FPS, with only 21 extra parameters. These results establish Spotlighter as an effective and scalable baseline for prompt tuning. Code for our method will be available at https://github.com/greatest-gourmet/Spotlighter.

Yutong Gao, Maoyuan Shao, Xinyang Huang, Chuang Zhu, Lijuan Sun, Yu Weng, Xuan Liu, Guoshun Nan• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102--
558
Image ClassificationFood101--
457
Image ClassificationStanfordCars--
312
Image ClassificationFGVCAircraft--
261
Image ClassificationCaltech101
Base Accuracy98.86
148
Image ClassificationEuroSAT
Base Accuracy93.17
104
Image ClassificationUCF101
Base Classes Acc89.72
100
Image ClassificationDTD
Base Score83.94
96
Image ClassificationImageNet
Base Score77.62
96
Image ClassificationOxford Pets
Base Accuracy96.48
60
Showing 10 of 12 rows

Other info

Follow for update