Delving into Sequential Patches for Deepfake Detection
About
Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions. As a result, researchers have been devoted to deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods, however, they still suffer from robustness problem against post-processings. In this work, we propose the Local- & Temporal-aware Transformer-based Deepfake Detection (LTTD) framework, which adopts a local-to-global learning protocol with a particular focus on the valuable temporal information within local sequences. Specifically, we propose a Local Sequence Transformer (LST), which models the temporal consistency on sequences of restricted spatial regions, where low-level information is hierarchically enhanced with shallow layers of learned 3D filters. Based on the local temporal embeddings, we then achieve the final classification in a global contrastive way. Extensive experiments on popular datasets validate that our approach effectively spots local forgery cues and achieves state-of-the-art performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deepfake Detection | DFDC | AUC80.4 | 135 | |
| Deepfake Detection | DFDC (test) | AUC80.4 | 87 | |
| Fake Face Detection | Celeb-DF v2 (test) | AUC89.3 | 50 | |
| Deepfake Detection | FF++ | AUC99.5 | 34 | |
| Deepfake Detection | Celeb-DF v2 (test) | Video-level AUC0.893 | 16 | |
| Deepfake Detection | CDF v2 | AUC0.893 | 16 | |
| Face Forgery Detection | FaceForensics++ (FF++) (test) | -- | 11 | |
| Deepfake Detection | DeepFo (test) | AUC98.5 | 10 | |
| Deepfake Detection | FaceSh (test) | AUC99.5 | 9 |