Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
About
Large-scale pre-trained vision-language models like CLIP have demonstrated impressive performance across various tasks, and exhibit remarkable zero-shot generalization capability, while they are also vulnerable to imperceptible adversarial examples. Existing works typically employ adversarial training (fine-tuning) as a defense method against adversarial examples. However, direct application to the CLIP model may result in overfitting, compromising the model's capacity for generalization. In this paper, we propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) method, which leverages supervision from the original pre-trained model by carefully designing an auxiliary branch, to enhance the model's zero-shot adversarial robustness. Specifically, PMG-AFT minimizes the distance between the features of adversarial examples in the target model and those in the pre-trained model, aiming to preserve the generalization features already captured by the pre-trained model. Extensive Experiments on 15 zero-shot datasets demonstrate that PMG-AFT significantly outperforms the state-of-the-art method, improving the top-1 robust accuracy by an average of 4.99%. Furthermore, our approach consistently improves clean accuracy by an average of 8.72%. Our code is available at https://github.com/serendipity1122/Pre-trained-Model-Guided-Fine-Tuning-for-Zero-Shot-Adversarial-Robustness.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet A | Top-1 Acc12.3 | 654 | |
| Image Classification | ImageNet V2 | Top-1 Acc54.8 | 611 | |
| Image Classification | SUN397 | Accuracy55.31 | 441 | |
| Image Classification | FGVCAircraft | Accuracy15.09 | 261 | |
| Image Classification | ImageNet-R | -- | 217 | |
| Image Classification | OxfordPets | Accuracy84.11 | 160 | |
| Image Classification | CIFAR10 | Top-1 Accuracy83.24 | 112 | |
| Image Classification | CIFAR100 | Accuracy43.94 | 102 | |
| Image Classification | ImageNet-S | Top-1 Acc38.4 | 92 | |
| Image Classification | StanfordCars | Robust Accuracy29 | 91 |