Grounding-Driven Attack: Improving Encoder-based Adversarial Transferability against Large Vision-Language Models

About

Large vision-language models (LVLMs) have achieved impressive performance across multimodal tasks, but their reliance on visual inputs exposes them to adversarial threats. Encoder-based attacks provide an efficient alternative to end-to-end optimization by crafting perturbations through the vision encoder alone. However, existing encoder-based attacks often assume that the surrogate encoder is identical or similar to the victim LVLM's vision encoder. In this work, we present a systematic study of their transferability in more realistic black-box deployments with heterogeneous LVLM architectures. We find that model-specific visual evidence is inconsistent across models, whereas text-conditioned grounding regions are more closely tied to caption-relevant evidence and provide a more stable transfer target. However, existing attacks remain weakly aligned with and insufficiently disrupt these regions. Motivated by these findings, we propose Grounding-Driven Attack (GDA), which aligns perturbation optimization with text-grounded evidence. GDA combines Grounding-Aware Perturbation Allocation to concentrate perturbation budget on grounded evidence regions with Grounding-Centric Evidence Disruption to intensify their global and local disruption. Experiments across diverse victim models and tasks show that GDA consistently outperforms existing encoder-based attacks in black-box transfer. These results highlight the central role of text-grounded evidence in adversarial transferability and motivate grounding-aware robustness evaluation and defense design.

Xinwei Zhang, Li Bai, Tianwei Zhang, Youqian Zhang, Qingqing Ye, Yingnan Zhao, Ruochen Du, Haibo Hu• 2026

Related benchmarks

Task	Dataset	Result
Adversarial Attack	LVLM Evaluation Set	ASR64	40
Adversarial Attack	GPT-4o	ASR16.6	14
Targeted Adversarial Attack	GPT-4o	ASR860	12
Adversarial Attack	Gemini 2.0	ASR13.2	11
Adversarial Attack Imperceptibility	Adversarial Attack (Evaluation Set)	SSIM0.9161	9
Image Classification	CIFAR-10 (test)	CIFAR-10 Classification Score99.6	9
Image Classification	CIFAR-10 BLIP-2	CLIP Similarity (RN-50)0.2256	9
Adversarial Attack	llava	CLIP Similarity (RN-50)0.2282	9
Adversarial Attack	Qwen VL 2.5	CLIP Similarity (RN-50)0.2481	9
Image Classification	CIFAR-10 InternVL3	CLIP Similarity (RN-50)0.2474	9

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord