Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection
About
AI-generated content (AIGC) is rapidly improving, creating an urgent need for detectors that generalize across data sources, deployment pipelines, and visual modalities. A strongly generalizable detector should remain robust under distributional variations. However, we identify a consistent failure mode: SOTA AI-generated image detectors often collapse when applied to frames extracted from videos. Through systematic analysis, we show that this cross-modal gap arises from both entangled synthesis-agnostic video processing shifts, including color conversion, codec compression, resizing, and blur, and model-specific fingerprints introduced by modern video generators. Motivated by these findings, we propose VINA (Video as Natural Augmentation), a unified AIGC detection framework that jointly trains on image and video data. VINA uses video frames as physically grounded natural augmentations and further introduces a cross-modal supervised contrastive objective to align image and video representations under a shared real/fake decision boundary. Extensive experiments on 14 image, video, and in-the-wild benchmarks show that VINA delivers bidirectional gains, improves robustness and transferability, and achieves state-of-the-art performance across nearly all evaluated settings without complex augmentation or dataset-specific tuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI-generated image detection | GenImage | Midjourney Detection Rate96.11 | 154 | |
| AIGC Detection | Aggregate AIGC Detection Video and Image | Overall Average Score98.39 | 14 | |
| Image AIGC Detection | Image-based AIGC Detection Benchmarks (ForenSynths, UniFD, DiTFake, ARForensics) | Average Detection Score99.08 | 14 | |
| Video AIGC Detection | Video-based AIGC Detection Benchmarks Magic, GenVideo, GenBuster++, DeepTrace Reward | Average Detection Score97.7 | 14 | |
| AI-Generated Content Detection | Chame-leon in-the-wild (test) | Balanced ACC91.4 | 12 | |
| AI-Generated Content Detection | AIGIBench in-the-wild (test) | SocRF90.9 | 12 | |
| AI-Generated Content Detection | RR-Dataset in-the-wild (test) | Balanced Accuracy82.7 | 12 | |
| AI-Generated Content Detection | WildRF in-the-wild (test) | FB Score96.9 | 12 | |
| AI-Generated Content Detection | SynthWildx in-the-wild (test) | DALLE3 Detection Score94.5 | 12 |