Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Raising the Bar of AI-generated Image Detection with CLIP

About

The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary, by using only a handful of example images from a single generative model, a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures, including recent commercial tools such as Dalle-3, Midjourney v5, and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/

Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nie{\ss}ner, Luisa Verdoliva• 2023

Related benchmarks

TaskDatasetResultRank
AI-generated image detectionGenImage--
106
Video Forgery DetectionDVF (test)
AUC (Video Crafter1)63.8
19
AI-generated image detectionDIRE
Accuracy86.8
15
AI-generated image detectionMNW
Accuracy60.7
15
AI-generated image detectionUDF
Accuracy89.5
15
AI-generated image detectionAverage (AVG)
Accuracy79.8
15
AI-generated image detectionGANDF
Accuracy79.6
15
AI-generated image detectionCNNDF
Accuracy80.3
15
AI-generated image detectionMGD
Accuracy55.7
15
Binary Video DetectionDVF cross-domain 42
Accuracy67
12
Showing 10 of 12 rows

Other info

Follow for update