Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models

About

Vision-language models like CLIP have demonstrated remarkable zero-shot transfer capabilities. However, their susceptibility to imperceptible adversarial perturbations remains a critical security concern. While test-time defenses offer a pragmatic solution for deployed models, existing approaches typically rely on gradient-based optimization during inference, incurring significant computational overhead. In this paper, we revisit the role of data augmentation in CLIP robustness and observe that augmentations are not equally effective: specific augmentations consistently provide robust geometric cues that align with correct class semantics in the hyperspherical feature space. Based on this, we propose Adaptive Geodesic Correction (AGC), a training-free defense mechanism that requires no parameter updates. AGC identifies a reliable augmentation as a geometric anchor and corrects the input feature towards it, utilizing an adaptive step size to balance robustness against clean accuracy preservation. AGC achieves superior performance across eight fine-grained datasets and three CLIP backbones, improving average robust accuracy by 44.4\% over state-of-the-art baseline while delivering a 10$\times$ reduction in inference latency. Our findings reveal a fundamental geometric property of CLIP features, offering a highly efficient and effective paradigm for robust multimodal deployment.

Zhiwei Li, Jiacheng Xue, Weining Wang, Ajian Liu, Xingyu Gao, Zhenan Sun, Qi Li• 2026

Related benchmarks

TaskDatasetResultRank
Fine grained classificationEuroSAT
Accuracy52
109
Fine grained classificationUCF101
Accuracy73.1
81
Fine grained classificationStanford Cars
Accuracy67.3
74
Fine grained classificationCaltech101
Accuracy94.4
60
Fine grained classificationPets
Accuracy93.4
53
Fine-grained Image ClassificationOxford-IIIT Pets
Accuracy87
43
Fine grained classificationDTD
Clean Accuracy52.2
41
Fine grained classificationFGVC Aircraft
Accuracy24.2
39
Fine grained classificationCars
Accuracy77.3
37
Fine grained classificationDescribable Textures Dataset (DTD)
Accuracy43.4
37
Showing 10 of 24 rows

Other info

Follow for update