Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Grounding-Driven Attack: Improving Encoder-based Adversarial Transferability against Large Vision-Language Models

About

Large vision-language models (LVLMs) have achieved impressive performance across multimodal tasks, but their reliance on visual inputs exposes them to adversarial threats. Encoder-based attacks provide an efficient alternative to end-to-end optimization by crafting perturbations through the vision encoder alone. However, existing encoder-based attacks often assume that the surrogate encoder is identical or similar to the victim LVLM's vision encoder. In this work, we present a systematic study of their transferability in more realistic black-box deployments with heterogeneous LVLM architectures. We find that model-specific visual evidence is inconsistent across models, whereas text-conditioned grounding regions are more closely tied to caption-relevant evidence and provide a more stable transfer target. However, existing attacks remain weakly aligned with and insufficiently disrupt these regions. Motivated by these findings, we propose Grounding-Driven Attack (GDA), which aligns perturbation optimization with text-grounded evidence. GDA combines Grounding-Aware Perturbation Allocation to concentrate perturbation budget on grounded evidence regions with Grounding-Centric Evidence Disruption to intensify their global and local disruption. Experiments across diverse victim models and tasks show that GDA consistently outperforms existing encoder-based attacks in black-box transfer. These results highlight the central role of text-grounded evidence in adversarial transferability and motivate grounding-aware robustness evaluation and defense design.

Xinwei Zhang, Li Bai, Tianwei Zhang, Youqian Zhang, Qingqing Ye, Yingnan Zhao, Ruochen Du, Haibo Hu• 2026

Related benchmarks

TaskDatasetResultRank
Adversarial AttackLVLM Evaluation Set
ASR64
40
Adversarial AttackGPT-4o
ASR16.6
14
Targeted Adversarial AttackGPT-4o
ASR860
12
Adversarial AttackGemini 2.0
ASR13.2
11
Adversarial Attack ImperceptibilityAdversarial Attack (Evaluation Set)
SSIM0.9161
9
Image ClassificationCIFAR-10 (test)
CIFAR-10 Classification Score99.6
9
Image ClassificationCIFAR-10 BLIP-2
CLIP Similarity (RN-50)0.2256
9
Adversarial Attackllava
CLIP Similarity (RN-50)0.2282
9
Adversarial AttackQwen VL 2.5
CLIP Similarity (RN-50)0.2481
9
Image ClassificationCIFAR-10 InternVL3
CLIP Similarity (RN-50)0.2474
9
Showing 10 of 21 rows

Other info

Follow for update