Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

About

Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. Project site: https://cfeng16.github.io/audio-visual-forensics

Chao Feng, Ziyang Chen, Andrew Owens• 2023

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDCP (test)	AUC45.58	56
Deepfake Detection	FF++ (test)	AUC67.01	44
Audio-visual video forgery detection	FakeAVCeleb	Accuracy92.71	41
Deepfake Detection	KoDF (test)	AUC86.9	31
Deepfake Detection	FaceForensics++ c23 (train)	FF c23 Score94.1	31
Deepfake Detection	Cross-Domain Evaluation (test)	CDFv1 Score73.82	31
Video Deepfake Detection	DF-TIMIT (test)	AUC77.39	27
Face Forgery Detection	FF++ (HQ)	AUC DF59.2	27
Face Forgery Detection	S2CFP (test)	Score (@ijustine)9.03	24
Deepfake Detection	IDForge (test)	AUC55.78	22

Showing 10 of 46 rows

Other info

Code

Follow for update

@wizwand_team Discord