Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

About

This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysis-resynthesis, pitch estimation, pitch modification, and speech enhancement.

Samir Sadok, Simon Leglaive, Laurent Girin, Ga\"el Richard, Xavier Alameda-Pineda• 2025

Related benchmarks

TaskDatasetResultRank
Speech EnhancementLibri1Mix
OVRL3
15
Speech DenoisingLibriMix (test)
N-MOS4.24
5
Speech SynthesisLibriSpeech 360 Clean (test)
STOI0.77
3
Speech SynthesisEmoV-DB (test)
STOI0.7
3
Showing 4 of 4 rows

Other info

Follow for update