Referee: Reference-aware Audiovisual Deepfake Detection

About

Deepfakes generated by advanced generative models have rapidly posed serious threats, yet existing audiovisual deepfake detection approaches struggle to generalize to unseen manipulation methods. To address this, we propose a novel reference-aware audiovisual deepfake detection method, called Referee to capture fine-grained identity discrepancies. Unlike existing methods that overfit to transient spatiotemporal artifacts, Referee employs identity bottleneck and matching modules to model the relational consistency of speaker-specific cues captured by a single one-shot example as a biometric anchor. Extensive experiments on FakeAVCeleb, FaceForensics++, and KoDF demonstrate that Referee achieves state-of-the-art results on cross-dataset and cross-language evaluation protocols, including a 99.4% AUC on KoDF. These results highlight that explicitly correlating reference-based biometric priors is a key frontier for achieving generalized and reliable audiovisual forensics. The code is available at https://github.com/ewha-mmai/referee.

Hyemin Boo, Eunsang Lee, Jiyoung Lee• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Detection	FF++ (test)	AUC79.78	44
Audiovisual Deepfake Detection	KoDF (test)	AUC99.4	13
Deepfake Detection	FakeAVCeleb Intra-dataset	Accuracy99.2	12

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord