Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

About

We introduce a deepfake video detection approach that exploits pixel-wise temporal inconsistencies, which traditional spatial frequency-based detectors often overlook. Traditional detectors represent temporal information merely by stacking spatial frequency spectra across frames, resulting in the failure to detect temporal artifacts in the pixel plane. Our approach performs a 1D Fourier transform on the time axis for each pixel, extracting features highly sensitive to temporal inconsistencies, especially in areas prone to unnatural movements. To precisely locate regions containing the temporal artifacts, we introduce an attention proposal module trained in an end-to-end manner. Additionally, our joint transformer module effectively integrates pixel-wise temporal frequency features with spatio-temporal context features, expanding the range of detectable forgery artifacts. Our framework represents a significant advancement in deepfake video detection, providing robust performance across diverse and challenging detection scenarios.

Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, Jongwon Choi• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFD	AUC0.92	193
Deepfake Detection	CelebDF v2	AUC0.914	134
Deepfake Detection	CDF v2	AUC0.6385	97
Deepfake Detection	FaceForensics++ (test)	AUC82.97	65
Image Deepfake Detection	DFo	AUC0.7152	62
Deepfake Detection	WDF	AUC0.741	54
Deepfake Detection	FaceForensics++ c23 (test)	AUC98.4	52
Deepfake Detection	DFD	Video AUC0.973	23
Deepfake Detection	DiF	AUC0.6644	22
Deepfake Detection	DaG	AUC71.24	22

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord