Locate and Verify: A Two-Stream Network for Improved Deepfake Detection
About
Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$\_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deepfake Detection | CDFv1, CDFv2, DFD, DFDCP, DFDC (test) | Overall Average Score89.1 | 74 | |
| LipSync Manipulation Detection | AVLips (test) | Accuracy75.52 | 7 | |
| LipSync Manipulation Detection | FF++ (test) | ACC91.02 | 7 | |
| LipSync Manipulation Detection | DFDC (test) | ACC77.39 | 7 | |
| sequential facial edit provenance tracing | SEED L=1 | Fixed Accuracy97.32 | 7 | |
| sequential facial edit provenance tracing | SEED L=2 | Fixed Accuracy78.12 | 7 | |
| sequential facial edit provenance tracing | SEED L=3 | Fixed Accuracy53.66 | 7 | |
| sequential facial edit provenance tracing | SEED Avg. | Fixed Accuracy71.5 | 7 | |
| sequential facial edit provenance tracing | SEED L=4 | Fixed Accuracy28.45 | 7 | |
| sequential facial edit provenance tracing | SEED L=0, no edits | Fixed Accuracy99.95 | 7 |