Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
About
The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing methods are limited to binary classification and lack the necessary explanations for human interpretation. In this paper, we present Skyra, a specialized multimodal large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos and leverages them as grounded evidence for both detection and explanation. To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video artifact dataset with fine-grained human annotations. We then develop a two-stage training strategy that systematically enhances our model's spatio-temporal artifact perception, explanation capability, and detection accuracy. To comprehensively evaluate Skyra, we introduce ViF-Bench, a benchmark comprising 3K high-quality samples generated by over ten state-of-the-art video generators. Extensive experiments demonstrate that Skyra surpasses existing methods across multiple benchmarks, while our evaluation yields valuable insights for advancing explainable AI-generated video detection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Forgery Detection | GenVideo (test) | Recall (Average)87.66 | 21 | |
| Video Forgery Detection | Video Datasets ID (In-Domain) GenBuster++, LOKI | GenBuster++ Score52.1 | 16 | |
| Video Forgery Detection | MintVid OOD | Fact Score51.9 | 16 | |
| Video Forgery Detection | OOD (Out-of-Domain) Video | Vidu Q137.7 | 16 | |
| Video Forgery Detection | ID, OOD, and OOD-MintVid Aggregated | Average Score52.5 | 16 | |
| Video Forgery Detection | GenVideo | Sora Detection Rate0.9564 | 15 | |
| AI-generated Video Detection | ViF-Bench T2V 1.0 (test) | Accuracy (Acc)91.02 | 13 | |
| AI-generated Video Detection | ViF-Bench I2V 1.0 (test) | Accuracy91.02 | 7 | |
| AI-generated Video Detection | GenVideo ModelScope | Accuracy79.93 | 6 | |
| AI-generated Video Detection | GenVideo Morph Studio | Accuracy94.43 | 6 |