CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

About

Despite its prevalent use in image-text matching tasks in a zero-shot manner, CLIP has been shown to be highly vulnerable to adversarial perturbations added onto images. Recent studies propose to finetune the vision encoder of CLIP with adversarial samples generated on the fly, and show improved robustness against adversarial attacks on a spectrum of downstream datasets, a property termed as zero-shot robustness. In this paper, we show that malicious perturbations that seek to maximise the classification loss lead to `falsely stable' images, and propose to leverage the pre-trained vision encoder of CLIP to counterattack such adversarial images during inference to achieve robustness. Our paradigm is simple and training-free, providing the first method to defend CLIP from adversarial attacks at test time, which is orthogonal to existing methods aiming to boost zero-shot adversarial robustness of CLIP. We conduct experiments across 16 classification datasets, and demonstrate stable and consistent gains compared to test-time defence methods adapted from existing adversarial robustness studies that do not rely on external networks, without noticeably impairing performance on clean images. We also show that our paradigm can be employed on CLIP models that have been adversarially finetuned to further enhance their robustness at test time. Our code is available \href{https://github.com/Sxing2/CLIP-Test-time-Counterattacks}{here}.

Songlong Xing, Zhengyu Zhao, Nicu Sebe• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	FGVCAircraft	Accuracy18	289
Image Classification	CIFAR10	Accuracy81.18	143
Fine grained classification	EuroSAT	Accuracy64.1	109
Image Classification	StanfordCars	Robust Accuracy33.01	100
Fine grained classification	UCF101	Accuracy75	81
Fine grained classification	Stanford Cars	Accuracy55	74
Image Classification	OxfordPets	Robust Accuracy57.87	71
Image Classification	Caltech256	Accuracy (Clean)79.66	69
Visual Question Answering	VQA	--	66
Zero-shot Classification	CIFAR100	--	65

Showing 10 of 120 rows

...

Other info

Code

Follow for update

@wizwand_team Discord