VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
About
The rapid proliferation of AI-generated video necessitates robust detection tools that offer both high accuracy and human-interpretable explanations. While existing MLLM-based detectors rely on supervised fine-tuning (SFT) or direct preference optimization (DPO), these methods are often bottlenecked by static, pre-labeled datasets that fail to capture the evolving, multi-step physical inconsistencies of modern generative models. To bridge this gap, we introduce VidGuard-R1, the first video authenticity detector to utilize group relative policy optimization (GRPO). Moving beyond passive preference matching, VidGuard-R1 employs a reinforcement learning framework that encourages the model to explore and rank multiple reasoning paths. By introducing specialized reward models for temporal stability and diffusion-aware complexity, we incentivize the model to discover 'physics-grounded' artifacts. Our contributions include: (1) a curated dataset of 140,000 challenging real/fake video pairs; (2) a GRPO-based training paradigm that achieves state-of-the-art zero-shot performance; and (3) a reasoning-first architecture that provides precise, verifiable rationales for its forensic judgments. Project website: https://vidguard-r1.github.io/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Forgery Detection | GenVideo (test) | Recall (Average)96 | 31 | |
| AI-Generated Video Identification | GenVidBench | MuseV Score97.38 | 18 | |
| AI-generated Video Detection | Our Dataset CogVideoX | Top-1 Accuracy84.32 | 16 | |
| AI-generated Video Detection | Our Dataset HunyuanVideo | Top-1 Accuracy86.17 | 16 | |
| Fake Video Detection | GenVidBench (test) | MuseV97.38 | 12 | |
| Human Ranking of Explanation Quality | VidGuard human evaluation subset (20 videos) | Average Rank1.67 | 3 | |
| AI Video Detection | Gen-3 Alpha | Total Count56 | 1 | |
| AI Video Detection | Pika | Total Count110 | 1 | |
| AI Video Detection | Pika 2.2 | Total110 | 1 | |
| AI Video Detection | Luma Ray2 | Total Count110 | 1 |