Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models

About

Large-scale pre-trained Vision-Language models (VLMs), such as CLIP, exhibit strong zero-shot generalization, yet remain highly vulnerable to imperceptible adversarial perturbations, raising serious safety concerns for open-world deployment. To enhance robustness without requiring downstream task-specific retraining, we propose TAME, a novel test-time defense. Building upon our prior Test-Time Adversarial Prompt Tuning (TAPT), TAME introduces an architectural reformulation by replacing TAPT's single adaptive prompt with an input-conditioned Mixture-of-Experts (MoE) framework, enabling more expressive and adaptive defense. Specifically, TAME maintains a bank of learnable expert prompts and employs an input-dependent routing mechanism to aggregate a customized prompt mixture for each unlabeled test sample at inference time. This test-time defense mechanism is driven by three unsupervised objectives: (1) multi-view prediction entropy minimization, (2) layer-wise alignment of visual token statistics to precomputed clean and adversarial reference distributions, and (3) MoE regularization for balanced expert utilization and prompt diversity. We evaluated TAME on 11 benchmark datasets, including ImageNet and 10 additional zero-shot datasets. The results show that TAME improves the zero-shot adversarial robustness of the original CLIP by at least 49.1% under AutoAttack while largely preserving generalization on clean samples. TAME also consistently outperforms existing adversarial prompt tuning methods across multiple prompt designs, yielding an average robustness gain of at least 30.2%.

Xin Wang, Yixu Wang, Jiaming Zhang, Ruofan Wang, Jiaqi Yu, Kai Chen, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang• 2026

Related benchmarks

TaskDatasetResultRank
Zero-shot Image ClassificationDTD
Robust Accuracy (PGD-100, eps=1/255)36.8
25
Zero-shot ClassificationEuroSAT--
21
Zero-shot Image ClassificationImageNet (val)
Adversarial Accuracy61.7
17
Zero-shot Image ClassificationImageNet
PGD Accuracy57
15
Zero-shot Image ClassificationCaltech101
PGD Robust Accuracy82.9
11
Zero-shot ClassificationSUN397--
10
Zero-shot Image ClassificationFood101
Accuracy (Robust, PGD-100, eps=1/255)60.8
9
Zero-shot Image ClassificationCaltech101 (test)
Adversarial Accuracy84.7
4
Zero-shot Image ClassificationEuroSAT (test)
Adversarial Accuracy (%)36.8
2
Zero-shot Image ClassificationPets (test)
Adversarial Accuracy78.5
2
Showing 10 of 21 rows

Other info

Follow for update