Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

About

Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks. Traditional targeted adversarial attacks require specific targets and labels, limiting their real-world impact.We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks through a novel foundation model approach. By pre-training on the massive LAION-400M dataset without label supervision, AnyAttack achieves unprecedented flexibility - enabling any image to be transformed into an attack vector targeting any desired output across different VLMs.This approach fundamentally changes the threat landscape, making adversarial capabilities accessible at an unprecedented scale. Our extensive validation across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) demonstrates AnyAttack's effectiveness across diverse multimodal tasks. Most concerning, AnyAttack seamlessly transfers to commercial systems including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT, revealing a systemic vulnerability requiring immediate attention.

Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, Dit-Yan Yeung• 2024

Related benchmarks

TaskDatasetResultRank
Targeted Attack on Image CaptioningFrontier MLLM Evaluation Set
Attack Success Rate (ASR)60
72
Targeted Adversarial Attack1,000-pair Targeted Attack Evaluation Set closed-source standard MLLMs 1.0
ASR9
48
Image-to-Text Adversarial AttackEvaluation set
ASR56
48
Targeted Adversarial AttackEvaluation set (test)
Attack Success Rate (ASR)0.00e+0
48
Adversarial AttackMedical Imaging Dataset 1,000 images 1.0 (test)
MTR66
36
Untargeted Adversarial AttackImageNet
ASR (Average)37.8
36
VQAVQA hard criterion
ASR2
32
Untargeted Adversarial AttackFlickr30K 1,000 images (test)
ASR55.42
30
Untargeted Adversarial AttackFlickr30K
ASR35.2
30
Targeted Adversarial AttackImageNet
ASR (Average)1.3
30
Showing 10 of 88 rows
...

Other info

Follow for update