Referee: Reference-aware Audiovisual Deepfake Detection
About
Deepfakes generated by advanced generative models have rapidly posed serious threats, yet existing audiovisual deepfake detection approaches struggle to generalize to unseen manipulation methods. To address this, we propose a novel reference-aware audiovisual deepfake detection method, called Referee to capture fine-grained identity discrepancies. Unlike existing methods that overfit to transient spatiotemporal artifacts, Referee employs identity bottleneck and matching modules to model the relational consistency of speaker-specific cues captured by a single one-shot example as a biometric anchor. Extensive experiments on FakeAVCeleb, FaceForensics++, and KoDF demonstrate that Referee achieves state-of-the-art results on cross-dataset and cross-language evaluation protocols, including a 99.4% AUC on KoDF. These results highlight that explicitly correlating reference-based biometric priors is a key frontier for achieving generalized and reliable audiovisual forensics. The code is available at https://github.com/ewha-mmai/referee.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deepfake Detection | FF++ (test) | AUC79.78 | 44 | |
| Audiovisual Deepfake Detection | KoDF (test) | AUC99.4 | 13 | |
| Deepfake Detection | FakeAVCeleb Intra-dataset | Accuracy99.2 | 12 |