AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models

About

Vision-language models like CLIP have demonstrated remarkable zero-shot transfer capabilities. However, their susceptibility to imperceptible adversarial perturbations remains a critical security concern. While test-time defenses offer a pragmatic solution for deployed models, existing approaches typically rely on gradient-based optimization during inference, incurring significant computational overhead. In this paper, we revisit the role of data augmentation in CLIP robustness and observe that augmentations are not equally effective: specific augmentations consistently provide robust geometric cues that align with correct class semantics in the hyperspherical feature space. Based on this, we propose Adaptive Geodesic Correction (AGC), a training-free defense mechanism that requires no parameter updates. AGC identifies a reliable augmentation as a geometric anchor and corrects the input feature towards it, utilizing an adaptive step size to balance robustness against clean accuracy preservation. AGC achieves superior performance across eight fine-grained datasets and three CLIP backbones, improving average robust accuracy by 44.4\% over state-of-the-art baseline while delivering a 10$\times$ reduction in inference latency. Our findings reveal a fundamental geometric property of CLIP features, offering a highly efficient and effective paradigm for robust multimodal deployment.

Zhiwei Li, Jiacheng Xue, Weining Wang, Ajian Liu, Xingyu Gao, Zhenan Sun, Qi Li• 2026

Related benchmarks

Task	Dataset	Result
Fine grained classification	EuroSAT	Accuracy52	138
Fine grained classification	UCF101	Accuracy73.1	98
Fine grained classification	Stanford Cars	Accuracy67.3	96
Fine grained classification	Caltech101	Accuracy94.4	76
Fine grained classification	Pets	Accuracy93.4	58
Fine-grained Image Classification	Oxford-IIIT Pets	Accuracy87	55
Fine grained classification	DTD	Clean Accuracy52.2	54
Fine grained classification	FGVC Aircraft	Accuracy24.2	51
Fine-grained Image Classification	Oxford Flowers 102	Accuracy67.8	45
Fine-grained Image Classification	8 Fine-grained Dataset Suite Average	Robustness93.4	42

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord