Forensic Similarity for Speech Deepfakes

About

In this paper, we introduce the concept of forensic similarity in the speech deepfake detection domain, which aims to determine whether two audio segments share the same underlying forensic traces. Our approach is inspired by prior work in the image domain. To transfer this idea to the audio domain, we propose a two-stage deep learning framework consisting of a Siamese-based feature extractor and a core decision module, referred to as the similarity network. The system goal to assess whether two speech samples originate from the same source by comparing their forensic characteristics. In practice, the model maps pairs of audio segments to a similarity score indicating whether they contain identical or different forensic traces. We evaluate the proposed method on the emerging task of source verification, demonstrating its ability to determine whether two speech samples were generated by the same model. In addition, we explore its applicability to audio splicing detection as a complementary use case. Experimental results show that the proposed approach generalizes well to previously unseen forensic traces, highlighting its robustness, flexibility, and practical relevance for digital audio forensics.

Viola Negroni, Davide Salvi, Daniele Ugo Leonzio, Paolo Bestagini, Stefano Tubaro• 2025

Related benchmarks

Task	Dataset	Result
Source verification	MLAAD open-set in-domain (test)	EER10.5	4
Source verification	TIMIT-TTS out-of-domain (test)	EER31.1	4
Source verification	ASVspoof out-of-domain 2019 (test)	EER25.6	4
Source verification	Average aggregated (test)	EER22.4	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord