Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
About
Multi-turn jailbreak attacks have proven effective against text-only large language models (LLMs), where malicious content is gradually introduced to bypass safety alignment. However, effectively extending such attacks to large vision-language models (LVLMs) remains underexplored. In this paper, we find that naively incorporating visual inputs can make multi-turn jailbreaks easier to defend against; for example, overly malicious visual content will easily trigger the defense mechanism in safety-aligned LVLMs, resulting in more conservative responses. Based on this finding, we propose multi-turn adaptive prompting attack (MAPA) that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 15-30% on recent benchmarks against LLaVA-v1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini. Our code is available at: https://github.com/thomaschoi143/MAPA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Jailbreak Attack | HarmBench | -- | 557 | |
| Jailbreak Attack | AdvBench | AASR98.96 | 271 | |
| Jailbreak Attack | JailbreakBench | ASR93.33 | 242 | |
| Jailbreak Attack | RedTeam 2K | ASR94.79 | 52 | |
| Jailbreak Attack | Jailbreak Evaluation GPT-4o-mini | ASR93.33 | 13 |