Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

How Robust is Google's Bard to Adversarial Image Attacks?

About

Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard. Update: GPT-4V is available at October 2023. We further evaluate its robustness under the same set of adversarial examples, achieving a 45% attack success rate.

Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu• 2023

Related benchmarks

TaskDatasetResultRank
Image-to-Text Adversarial AttackEvaluation set
ASR66
48
Targeted Adversarial Attack1,000-pair Targeted Attack Evaluation Set closed-source standard MLLMs 1.0
ASR7
48
Targeted Adversarial AttackEvaluation set (test)
Attack Success Rate (ASR)0.00e+0
48
Adversarial AttackLVLM Evaluation Set
ASR97.8
40
Adversarial AttackMedical Imaging Dataset 1,000 images 1.0 (test)
MTR62
36
Image Captioning RobustnessImage Captioning Dataset
CLIP Score (RN-50)53.2
30
Transferable Adversarial AttackGLM 4.6V
ASR1
16
Targeted AttackGemini 1.5-pro 2.5-flash (test)
ASR0.00e+0
16
Transferable Adversarial AttackLlama 11B-V 3.2
Attack Success Rate (ASR)0.00e+0
16
Transferable Adversarial AttackKimi K2.5
ASR (%)0.00e+0
16
Showing 10 of 53 rows

Other info

Follow for update