Post-training for Deepfake Speech Detection

About

We introduce a post-training approach that adapts self-supervised learning (SSL) models for deepfake speech detection by bridging the gap between general pre-training and domain-specific fine-tuning. We present AntiDeepfake models, a series of post-trained models developed using a large-scale multilingual speech dataset containing over 56,000 hours of genuine speech and 18,000 hours of speech with various artifacts in over one hundred languages. Experimental results show that the post-trained models already exhibit strong robustness and generalization to unseen deepfake speech. When they are further fine-tuned on the Deepfake-Eval-2024 dataset, these models consistently surpass existing state-of-the-art detectors that do not leverage post-training. Model checkpoints and source code are available online.

Wanying Ge, Xin Wang, Xuechen Liu, Junichi Yamagishi• 2025

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	in the wild	EER1.23	76
Audio Deepfake Detection	ASVspoof LA 2019 (eval)	EER0.0011	36
Speech Deepfake Detection	FakeOrReal	EER173	30
Audio Deepfake Detection	ITW	ACC98.7	15
Deepfake Detection	CodecFake+ CoSG ExtEval	EER22.19	11
Deepfake Detection	CodecFake+ CoSG (Eval)	EER3.95	11
Speech Deepfake Detection	ODSS	EER (%)1.13	7
Speech Deepfake Detection	EF	EER20	7
Speech Deepfake Detection	ADD ASVspoof 2022	EER1.05	7
Speech Deepfake Detection	ADD ASVspoof 2023	EER4.67	7

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord