Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models
About
While specialized detectors for AI-Generated Images (AIGI) achieve near-perfect accuracy on curated benchmarks, they suffer from a dramatic performance collapse in realistic, in-the-wild scenarios. In this work, we demonstrate that simplicity prevails over complex architectural designs. A simple linear classifier trained on the frozen features of modern Vision Foundation Models , including Perception Encoder, MetaCLIP 2, and DINOv3, establishes a new state-of-the-art. Through a comprehensive evaluation spanning traditional benchmarks, unseen generators, and challenging in-the-wild distributions, we show that this baseline not only matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets, boosting accuracy by striking margins of over 30\%. We posit that this superior capability is an emergent property driven by the massive scale of pre-training data containing synthetic content. We trace the source of this capability to two distinct manifestations of data exposure: Vision-Language Models internalize an explicit semantic concept of forgery, while Self-Supervised Learning models implicitly acquire discriminative forensic features from the pretraining data. However, we also reveal persistent limitations: these models suffer from performance degradation under recapture and transmission, remain blind to VAE reconstruction and localized editing. We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI-generated image detection | SocialRF (In-the-wild) | Real Accuracy93.7 | 18 | |
| AIGI Detection | GenImage v1.4 (test) | ADM Score0.87 | 18 | |
| AI-generated image detection | WildRF (In-the-wild) | Accuracy (Real)94.8 | 18 | |
| AI-generated image detection | Chameleon In-the-wild | Real Accuracy97 | 18 | |
| AI-generated image detection | CommunityAI (In-the-wild) | Real Accuracy96.6 | 18 | |
| AI-generated image detection | AIGI-Now | FLUX-dev Pixel Score0.979 | 17 | |
| Video Forgery Detection | GenVideo | Sora Detection Rate0.6871 | 15 | |
| Video Forgery Detection | VidProm | Pika Score99.85 | 13 | |
| AIGI Detection | RRDataset Redigital (Recapture) | Accuracy (Real)96.4 | 8 | |
| AIGI Detection | RRDataset Original Base | Accuracy (Real)95.1 | 8 |