Fair Deepfake Detectors Can Generalize

About

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

Harry Cheng, Ming-Hui Liu, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDC	AUC76.8	230
Deepfake Detection	DFD	AUC0.925	193
Deepfake Detection	CDF v2	AUC0.909	97
Image Deepfake Detection	DFo	AUC0.884	62
Deepfake Detection	WDF	AUC80.1	54
Image Deepfake Detection	FFIW	AUC0.883	47
Deepfake Detection	FaceDan	AUC91.7	30
Deepfake Detection	UniFace	AUC90.2	30
Deepfake Detection	e4s	AUC0.838	30
Deepfake Detection	BleFace	AUC84.9	30

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord