Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

About

Multi-turn jailbreak attacks have proven effective against text-only large language models (LLMs), where malicious content is gradually introduced to bypass safety alignment. However, effectively extending such attacks to large vision-language models (LVLMs) remains underexplored. In this paper, we find that naively incorporating visual inputs can make multi-turn jailbreaks easier to defend against; for example, overly malicious visual content will easily trigger the defense mechanism in safety-aligned LVLMs, resulting in more conservative responses. Based on this finding, we propose multi-turn adaptive prompting attack (MAPA) that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 15-30% on recent benchmarks against LLaVA-v1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini. Our code is available at: https://github.com/thomaschoi143/MAPA.

In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song• 2026

Related benchmarks

Task	Dataset	Result
Jailbreak Attack	HarmBench	--	557
Jailbreak Attack	AdvBench	AASR98.96	271
Jailbreak Attack	JailbreakBench	ASR93.33	242
Jailbreak Attack	RedTeam 2K	ASR94.79	52
Jailbreak Attack	Jailbreak Evaluation GPT-4o-mini	ASR93.33	13

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord