ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity
About
AI-generated videos (AIGVs) have achieved unprecedented photorealism, posing severe threats to digital forensics. Existing AIGV detectors focus mainly on localized artifacts or short-term temporal inconsistencies, thus often fail to capture the underlying generative logic governing global temporal evolution, limiting AIGV detection performance. In this paper, we identify a distinctive fingerprint in AIGVs, termed anomalous temporal self-similarity (ATSS). Unlike real videos that exhibit stochastic natural dynamics, AIGVs follow deterministic anchor-driven trajectories (e.g., text or image prompts), inducing unnaturally repetitive correlations across visual and semantic domains. To exploit this, we propose the ATSS method, a multimodal detection framework that exploits this insight via a triple-similarity representation and a cross-attentive fusion mechanism. Specifically, ATSS reconstructs semantic trajectories by leveraging frame-wise descriptions to construct visual, textual, and cross-modal similarity matrices, which jointly quantify the inherent temporal anomalies. These matrices are encoded by dedicated Transformer encoders and integrated via a bidirectional cross-attentive fusion module to effectively model intra- and inter-modal dynamics. Extensive experiments on four large-scale benchmarks, including GenVideo, EvalCrafter, VideoPhy, and VidProM, demonstrate that ATSS significantly outperforms state-of-the-art methods in terms of AP, AUC, and ACC metrics, exhibiting superior generalization across diverse video generation models. Code and models of ATSS will be released at https://github.com/hwang-cs-ime/ATSS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Synthetic Video Detection | GenVideo (test) | Average Detection Rate97.11 | 34 | |
| AI-generated Video Detection | EvalCrafter | Floor33 Score99.09 | 28 | |
| AI-generated Video Detection | VideoPhy 1.0 (test) | CVX Score92.22 | 28 | |
| Video Detection | GenVideo | ACC94.32 | 14 | |
| Video Detection | EvalCrafter | ACC96.01 | 14 | |
| Video Detection | VideoPhy | Accuracy89.61 | 14 | |
| Video Detection | VidProm | Accuracy88.42 | 14 | |
| AI-generated Video Detection | VidProm | AUC (MS)91.05 | 14 | |
| AI-generated Video Detection | GenVideo | MS Score95.21 | 14 | |
| AI-generated Video Detection | VidProM (test) | MS Performance93.49 | 14 |