Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

About

With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML.

Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He• 2024

Related benchmarks

TaskDatasetResultRank
Jailbreak AttackHarmBench--
487
Jailbreak AttackSafeBench
ASR22.4
128
Jailbreak AttackMalicious goals dataset (test)
ASR0.00e+0
99
Jailbreak AttackJailbreakBench
ASR0.00e+0
76
Jailbreak Safety EvaluationMM-Safety Bench (test)
Average ASR14.92
56
Multimodal JailbreakingHADES-Dataset
ASR (%)99.07
20
JailbreakingGPT-4o
ASR0.978
19
Jailbreak AttackClaude 3.5
ASR60.4
19
Safety EvaluationSafeBench
IA Score92
15
Jailbreak AttackQwen-VL
ASR (%)1.8
11
Showing 10 of 24 rows

Other info

Code

Follow for update