Boosting Adversarial Transferability with Low-Cost Optimization via Maximin Expected Flatness
About
Transfer-based attacks craft adversarial examples on white-box surrogate models and directly deploy them against black-box target models, offering model-agnostic and query-free threat scenarios. While flatness-enhanced methods have recently emerged to improve transferability by enhancing the loss surface flatness of adversarial examples, their divergent flatness definitions and heuristic attack designs suffer from unexamined optimization limitations and missing theoretical foundation, thus constraining their effectiveness and efficiency. This work exposes the severely imbalanced exploitation-exploration dynamics in flatness optimization, establishing the first theoretical foundation for flatness-based transferability and proposing a principled framework to overcome these optimization pitfalls. Specifically, we systematically unify fragmented flatness definitions across existing methods, revealing their imbalanced optimization limitations in over-exploration of sensitivity peaks or over-exploitation of local plateaus. To resolve these issues, we rigorously formalize average-case flatness and transferability gaps, proving that enhancing zeroth-order average-case flatness minimizes cross-model discrepancies. Building on this theory, we design a Maximin Expected Flatness (MEF) attack that enhances zeroth-order average-case flatness while balancing flatness exploration and exploitation. Extensive evaluations across 22 models and 24 current transfer-based attacks demonstrate MEF's superiority: it surpasses the state-of-the-art PGN attack by 4% in attack success rate at half the computational cost and achieves 8% higher success rate under the same budget. When combined with input augmentation, MEF attains 15% additional gains against defense-equipped models, establishing new robustness benchmarks. Our code is available at https://github.com/SignedQiu/MEFAttack.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | ImageNet (val) | ASR (General)100 | 222 | |
| Adversarial Attack | ImageNet | Attack Success Rate86.2 | 178 | |
| Adversarial Attack Transferability | ImageNet-1k (val) | Attack Success Rate0.949 | 13 | |
| Black-box Adversarial Attack | ImageNet (test) | Success Rate (Res34)100 | 13 | |
| OCR VQA | TextVQA (test) | Pre Accuracy61.9 | 10 | |
| Text-based Visual Question Answering | TextVQA (test) | Pre Accuracy56.9 | 10 | |
| Untargeted Adversarial Attack | Chinese Traffic Sign Recognition Database (CTSRD) (test) | ASR (ResNet34)100 | 5 |