Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

About

Vision-language models (VLMs), such as CLIP, have gained significant popularity as foundation models, with numerous fine-tuning methods developed to enhance performance on downstream tasks. However, due to their inherent vulnerability and the common practice of selecting from a limited set of open-source models, VLMs suffer from a higher risk of adversarial attacks than traditional vision models. Existing defense techniques typically rely on adversarial fine-tuning during training, which requires labeled data and lacks of flexibility for downstream tasks. To address these limitations, we propose robust test-time prompt tuning (R-TPT), which mitigates the impact of adversarial attacks during the inference stage. We first reformulate the classic marginal entropy objective by eliminating the term that introduces conflicts under adversarial conditions, retaining only the pointwise entropy minimization. Furthermore, we introduce a plug-and-play reliability-based weighted ensembling strategy, which aggregates useful information from reliable augmented views to strengthen the defense. R-TPT enhances defense against adversarial attacks without requiring labeled training data while offering high flexibility for inference tasks. Extensive experiments on widely used benchmarks with various attacks demonstrate the effectiveness of R-TPT. The code is available in https://github.com/TomSheng21/R-TPT.

Lijun Sheng, Jian Liang, Zilei Wang, Ran He• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-R
Top-1 Acc76.93
529
Fine-grained Image ClassificationStanford Cars
Accuracy67
284
Image ClassificationFGVCAircraft
Accuracy19.14
261
Image ClassificationStanfordCars
Robust Accuracy59.8
91
Image ClassificationCIFAR10
Accuracy82.19
91
Fine grained classificationEuroSAT
Accuracy44.3
81
Image ClassificationCaltech256
Accuracy (Clean)77.67
69
Zero-shot ClassificationCIFAR100--
65
Zero-shot ClassificationCIFAR10
Top-1 Clean Acc81.6
62
Fine grained classificationAircraft
Top-1 Acc24.03
62
Showing 10 of 99 rows
...

Other info

Code

Follow for update