SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

About

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We present the baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available at https://jungjee.github.io/spoofceleb.

Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe• 2024

Related benchmarks

Task	Dataset	Result
Spoofing-aware speaker verification	SpoofCeleb (eval set)	a-DCF0.2902	17
Speech Deepfake Detection	RTCFake Offline	EER29.56	12
Speech Deepfake Detection	RTCFake (All)	EER34.06	12
Speech Deepfake Detection	RTCFake Online	P0140.05	12
Spoofing-aware speaker verification	SpoofCeleb (dev)	Min a-DCF0.3101	7

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord