Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

About

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.

Arun Kumar Singh, Priyanka Singh (2) __INSTITUTION_2__ Indian Institute of Technology Jammu, (2) Dhirubhai Ambani Institute of Information, Communication Technology)• 2020

Related benchmarks

TaskDatasetResultRank
Machine-generated music detectionFakeMusicCaps--
13
Audio ClassificationM6 subset f
Accuracy67.1
9
Audio ClassificationM6 subset o
Accuracy78
9
Showing 3 of 3 rows

Other info

Follow for update