Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation

About

Large Vision-Language Models (VLMs) have demonstrated remarkable performance across multimodal tasks by integrating vision encoders with large language models (LLMs). However, these models remain vulnerable to adversarial attacks. Among such attacks, Universal Adversarial Perturbations (UAPs) are especially powerful, as a single optimized perturbation can mislead the model across various input images. In this work, we introduce a novel UAP specifically designed for VLMs: the Doubly-Universal Adversarial Perturbation (Doubly-UAP), capable of universally deceiving VLMs across both image and text inputs. To successfully disrupt the vision encoder's fundamental process, we analyze the core components of the attention mechanism. After identifying value vectors in the middle-to-late layers as the most vulnerable, we optimize Doubly-UAP in a label-free manner with a frozen model. Despite being developed as a black-box to the LLM, Doubly-UAP achieves high attack success rates on VLMs, consistently outperforming baseline methods across vision-language tasks. Extensive ablation studies and analyses further demonstrate the robustness of Doubly-UAP and provide insights into how it influences internal attention mechanisms.

Hee-Seon Kim, Minbeom Kim, Changick Kim• 2024

Related benchmarks

TaskDatasetResultRank
Adversarial AttackMVBench
ASR63.63
37
Adversarial AttackMantis-Eval
Attack Success Rate56.87
37
Adversarial AttackNLVR2
Attack Success Rate31.43
37
Adversarial AttackBLINK
Attack Success Rate (ASR)63.74
37
Adversarial AttackQ-Bench
Attack Success Rate44.76
37
Visual Question AnsweringMM-Vet--
27
Visual Question AnsweringMantis-Eval
ASR46.45
12
Visual Question AnsweringLLaVA-Bench
VQA ASR38.74
12
Showing 8 of 8 rows

Other info

Follow for update