CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

About

Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can effectively counter deepfake audios from codec-based speech synthesis systems remains unanswered. In this paper, we curate an extensive collection of contemporary SOTA codec models, employing them to re-create synthesized speech. This endeavor leads to the creation of CodecFake, the first codec-based deepfake audio dataset. Additionally, we verify that anti-spoofing models trained on commonly used datasets cannot detect synthesized speech from current codec-based speech generation systems. The proposed CodecFake dataset empowers these models to counter this challenge effectively.

Haibin Wu, Yuan Tseng, Hung-yi Lee• 2024

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	CodecFake	EER10.13	50
Speech Deepfake Detection	ICF	Accuracy90.6	23
CodecFake Detection	ECFD - TIS Corpus Elderly E2 (test)	EER27.45	22
CodecFake Detection	ECFD SeniorTalk (E1) (test)	EER (%)30.18	2
CodecFake Detection	ECFD - TIS Corpus (E2) - Young (test)	EER14.07	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord