An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

About

Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts: embedding similarity module and temporal convolution operation. To enhance the identification between the real and fake features, the embedding similarity module is designed to generate an embedding space that can separate the real frames from fake frames. To effectively concentrate on the position information, temporal convolution operation is proposed to calculate the frame-specific similarities among neighboring frames, and dynamically select informative neighbors to convolution. Extensive experiments show that our method outperform baseline models in ASVspoof2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario.

Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye• 2023

Related benchmarks

Task	Dataset	Result
Audio Spoof Detection	PartialSpoof (PS) (test)	EER7.04	22
Spoofed Speech Detection	PartialSpoof (Overall)	EER7.04	16
Content Localization	HumanEdit	Accuracy89.34	5
Content Localization	Pool HumanEdit and AiEdit average	Accuracy90.18	5
Content Localization	AiEdit	Accuracy91.02	5
Speech Editing Detection	HumanEdit	Accuracy96.25	5
Speech Editing Detection	AiEdit	Accuracy34.95	5
Speech Editing Detection	Pool HumanEdit and AiEdit average	Acc65.6	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord