Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

About

While specialized detectors for AI-Generated Images (AIGI) achieve near-perfect accuracy on curated benchmarks, they suffer from a dramatic performance collapse in realistic, in-the-wild scenarios. In this work, we demonstrate that simplicity prevails over complex architectural designs. A simple linear classifier trained on the frozen features of modern Vision Foundation Models , including Perception Encoder, MetaCLIP 2, and DINOv3, establishes a new state-of-the-art. Through a comprehensive evaluation spanning traditional benchmarks, unseen generators, and challenging in-the-wild distributions, we show that this baseline not only matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets, boosting accuracy by striking margins of over 30\%. We posit that this superior capability is an emergent property driven by the massive scale of pre-training data containing synthetic content. We trace the source of this capability to two distinct manifestations of data exposure: Vision-Language Models internalize an explicit semantic concept of forgery, while Self-Supervised Learning models implicitly acquire discriminative forensic features from the pretraining data. However, we also reveal persistent limitations: these models suffer from performance degradation under recapture and transmission, remain blind to VAE reconstruction and localized editing. We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.

Yue Zhou, Xinan He, Kaiqing Lin, Bing Fan, Feng Ding, Bin Li• 2026

Related benchmarks

Task	Dataset	Result
AI-generated image detection	AIGI-Now	FLUX-dev Pixel Score0.979	49
AI-generated image detection	SocialRF (In-the-wild)	Real Accuracy93.7	18
AIGI Detection	GenImage v1.4 (test)	ADM Score0.87	18
AI-generated image detection	WildRF (In-the-wild)	Accuracy (Real)94.8	18
AI-generated image detection	Chameleon In-the-wild	Real Accuracy97	18
AI-generated image detection	CommunityAI (In-the-wild)	Real Accuracy96.6	18
Video Forgery Detection	GenVideo	Sora Detection Rate0.6871	15
Video Forgery Detection	VidProm	Pika Score99.85	13
AIGI Detection	RRDataset Redigital (Recapture)	Accuracy (Real)96.4	8
AIGI Detection	RRDataset Original Base	Accuracy (Real)95.1	8

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord