A Provable Energy-Guided Test-Time Defense Boosting Adversarial Robustness of Large Vision-Language Models

About

Despite the rapid progress in multimodal models and Large Visual-Language Models (LVLM), they remain highly susceptible to adversarial perturbations, raising serious concerns about their reliability in real-world use. While adversarial training has become the leading paradigm for building models that are robust to adversarial attacks, Test-Time Transformations (TTT) have emerged as a promising strategy to boost robustness at inference. In light of this, we propose Energy-Guided Test-Time Transformation (ET3), a lightweight, training-free defense that enhances the robustness by minimizing the energy of the input samples. Our method is grounded in a theory that proves our transformation succeeds in classification under reasonable assumptions. We present extensive experiments demonstrating that ET3 provides a strong defense for classifiers, zero-shot classification with CLIP, and also for boosting the robustness of LVLMs in tasks such as Image Captioning and Visual Question Answering. Code is available at github.com/OmnAI-Lab/Energy-Guided-Test-Time-Defense .

Mujtaba Hussain Mirza, Antonio D'Orazio, Odelia Melamed, Iacopo Masi• 2026

Related benchmarks

Task	Dataset	Result
Fine grained classification	EuroSAT	Accuracy13.37	109
Fine grained classification	UCF101	Accuracy37.35	81
Fine grained classification	Caltech101	Accuracy79.27	60
Fine grained classification	Pets	Accuracy66.86	53
Fine grained classification	DTD	Clean Accuracy26.42	41
Fine grained classification	Cars	Accuracy10.32	37
Fine grained classification	Aircraft	Accuracy5.85	30
Zero-shot Image Classification	14 Robustness Benchmark Datasets (ImageNet, CalTech, Cars, CIFAR10, CIFAR100, DTD, EuroSAT, FGVC, Flowers, ImageNet-R, ImageNet-S, PCAM, OxfordPets, STL-10) (test)	ImageNet Accuracy80.11	16
Zero-shot Image Classification	ImageNet 1k (test)	Accuracy (Zero-shot)79.82	16
Image Captioning	COCO Clean (test)	CIDEr115.5	10

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord