Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
About
Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-to-Text Adversarial Attack | Evaluation set | ASR95.4 | 48 | |
| Targeted Adversarial Attack | Evaluation set (test) | Attack Success Rate (ASR)48.6 | 48 | |
| Targeted Adversarial Attack | 1,000-pair Targeted Attack Evaluation Set closed-source standard MLLMs 1.0 | ASR92.2 | 48 | |
| Adversarial Attack | 100 source-target pairs standard (test) | Attack Success Rate (ASR)70.7 | 18 | |
| Targeted Adversarial Transferability | Targeted Adversarial Transferability Evaluation Suite (test) | Qwen2.5-VL-7B ASR72.3 | 18 | |
| Targeted Attack | Gemini 1.5-pro 2.5-flash (test) | ASR65.9 | 16 | |
| Transferable Adversarial Attack | GLM 4.6V | ASR77.9 | 16 | |
| Transferable Adversarial Attack | Llama 11B-V 3.2 | Attack Success Rate (ASR)52.3 | 16 | |
| Transferable Adversarial Attack | Kimi K2.5 | ASR (%)56.3 | 16 | |
| Black-Box LVLM Attack | PatternNet | KMRa88 | 15 |