Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection

About

AI-generated content (AIGC) is rapidly improving, creating an urgent need for detectors that generalize across data sources, deployment pipelines, and visual modalities. A strongly generalizable detector should remain robust under distributional variations. However, we identify a consistent failure mode: SOTA AI-generated image detectors often collapse when applied to frames extracted from videos. Through systematic analysis, we show that this cross-modal gap arises from both entangled synthesis-agnostic video processing shifts, including color conversion, codec compression, resizing, and blur, and model-specific fingerprints introduced by modern video generators. Motivated by these findings, we propose VINA (Video as Natural Augmentation), a unified AIGC detection framework that jointly trains on image and video data. VINA uses video frames as physically grounded natural augmentations and further introduces a cross-modal supervised contrastive objective to align image and video representations under a shared real/fake decision boundary. Extensive experiments on 14 image, video, and in-the-wild benchmarks show that VINA delivers bidirectional gains, improves robustness and transferability, and achieves state-of-the-art performance across nearly all evaluated settings without complex augmentation or dataset-specific tuning.

Zhengcen Li, Chenyang Jiang, Liangxu Su, Tong Shao, Shiyang Zhou, Ming Tao, Jingyong Su• 2026

Related benchmarks

Task	Dataset	Result
AI-generated image detection	GenImage	Midjourney Detection Rate96.11	173
AIGC Detection	Aggregate AIGC Detection Video and Image	Overall Average Score98.39	14
Image AIGC Detection	Image-based AIGC Detection Benchmarks (ForenSynths, UniFD, DiTFake, ARForensics)	Average Detection Score99.08	14
Video AIGC Detection	Video-based AIGC Detection Benchmarks Magic, GenVideo, GenBuster++, DeepTrace Reward	Average Detection Score97.7	14
AI-Generated Content Detection	Chame-leon in-the-wild (test)	Balanced ACC91.4	12
AI-Generated Content Detection	AIGIBench in-the-wild (test)	SocRF90.9	12
AI-Generated Content Detection	RR-Dataset in-the-wild (test)	Balanced Accuracy82.7	12
AI-Generated Content Detection	WildRF in-the-wild (test)	FB Score96.9	12
AI-Generated Content Detection	SynthWildx in-the-wild (test)	DALLE3 Detection Score94.5	12

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord