Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
About
Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-to-Text Adversarial Attack | Evaluation set | ASR97.4 | 48 | |
| Targeted Adversarial Attack | Evaluation set (test) | Attack Success Rate (ASR)58.5 | 48 | |
| Targeted Adversarial Attack | 1,000-pair Targeted Attack Evaluation Set closed-source standard MLLMs 1.0 | ASR94.2 | 48 | |
| Adversarial Attack | 100 source-target pairs standard (test) | Attack Success Rate (ASR)83 | 18 | |
| Targeted Attack | Gemini 1.5-pro 2.5-flash (test) | ASR67.4 | 16 | |
| Transferable Adversarial Attack | GLM 4.6V | ASR87.1 | 16 | |
| Transferable Adversarial Attack | Llama 11B-V 3.2 | Attack Success Rate (ASR)57.3 | 16 | |
| Transferable Adversarial Attack | Kimi K2.5 | ASR (%)69 | 16 | |
| Adversarial Attack | 1,000-pair benchmark (test) | Attack Success Rate (ASR)59.9 | 12 | |
| Adversarial Attack | 1,000-pair main panel | ASR92.7 | 12 |