Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

About

Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. To this end, we propose a new method to attack AIGI detectors. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous AIGI detectors, e.g., transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we demonstrate that adversarial attacks pose a real threat to AIGI detectors. FPBA can deliver successful black-box attacks across various detectors, generators, defense methods, and even evade cross-generator and compressed image detection, which are crucial real-world detection scenarios. Our code is available at https://github.com/onotoa/fpba.

Yunfeng Diao, Naixin Zhai, Changtao Miao, Zitong Yu, Xingxing Wei, Xun Yang, Meng Wang• 2024

Related benchmarks

TaskDatasetResultRank
Adversarial AttackSynthetic LSUN ProGAN
CNNSpot Performance98.9
12
Adversarial AttackGenImage SD
CNNSpot Performance100
12
Adversarial AttackFFHQ StyleGAN synthetic (test)
CNNSpot100
12
Adversarial AttackLSUN Synthetic
ASR (CNNSpot)0.989
7
AIGI DetectionGenImage Stable Diffusion AEROBLADE detector (test)
AP43.1
5
AIGI DetectionGenImage Stable Diffusion DIRE detector (test)
AP40.8
5
Deepfake Detection AttackLSUN ProGAN (test)
ASR (CNNSpot)100
4
Deepfake Detection AttackGenImage SD (test)
ASR (CNNSpot)1
4
Transfer-based AttackLSUN ProGAN
GramNet87.8
4
Adversarial Attack on AIGI DetectorsMidjourney
ASR97.6
3
Showing 10 of 15 rows

Other info

Follow for update