Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Speaker anonymization using neural audio codec language models

About

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022.

Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas Evans• 2023

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech Clean other (test)
WER41
34
Speech Emotion RecognitionIEMOCAP
Weighted Accuracy (WA)65.57
6
Voice AnonymizationLibriSpeech clean (test)
EER41.88
4
Voice AnonymizationLibrispeech other (test)
EER37.88
4
Voice AnonymizationLibriTTS clean (test)
EER43.06
4
Voice AnonymizationLibriTTS other (test)
EER43.18
4
Voice AnonymizationIEMOCAP
EER53
4
Voice AnonymizationNVIDIA RTX 3090 GPU
RTF1.62
3
Showing 8 of 8 rows

Other info

Follow for update