Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Provable Energy-Guided Test-Time Defense Boosting Adversarial Robustness of Large Vision-Language Models

About

Despite the rapid progress in multimodal models and Large Visual-Language Models (LVLM), they remain highly susceptible to adversarial perturbations, raising serious concerns about their reliability in real-world use. While adversarial training has become the leading paradigm for building models that are robust to adversarial attacks, Test-Time Transformations (TTT) have emerged as a promising strategy to boost robustness at inference. In light of this, we propose Energy-Guided Test-Time Transformation (ET3), a lightweight, training-free defense that enhances the robustness by minimizing the energy of the input samples. Our method is grounded in a theory that proves our transformation succeeds in classification under reasonable assumptions. We present extensive experiments demonstrating that ET3 provides a strong defense for classifiers, zero-shot classification with CLIP, and also for boosting the robustness of LVLMs in tasks such as Image Captioning and Visual Question Answering. Code is available at github.com/OmnAI-Lab/Energy-Guided-Test-Time-Defense .

Mujtaba Hussain Mirza, Antonio D'Orazio, Odelia Melamed, Iacopo Masi• 2026

Related benchmarks

TaskDatasetResultRank
Fine grained classificationEuroSAT
Accuracy13.37
81
Fine grained classificationUCF101
Accuracy37.35
53
Fine grained classificationCaltech101
Accuracy79.27
39
Fine grained classificationDTD
Clean Accuracy26.42
34
Fine grained classificationPets
Accuracy66.86
32
Zero-shot Image Classification14 Robustness Benchmark Datasets (ImageNet, CalTech, Cars, CIFAR10, CIFAR100, DTD, EuroSAT, FGVC, Flowers, ImageNet-R, ImageNet-S, PCAM, OxfordPets, STL-10) (test)
ImageNet Accuracy80.11
16
Zero-shot Image ClassificationImageNet 1k (test)
Accuracy (Zero-shot)79.82
16
Fine grained classificationCars
Accuracy10.32
16
Fine grained classificationAircraft
Accuracy5.85
16
Image CaptioningCOCO Clean (test)
CIDEr115.5
10
Showing 10 of 21 rows

Other info

Follow for update