Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

About

With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from single speech audio. Furthermore, to enhance StyleSpeech's adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker's voice with single short-duration (1-3 sec) speech audio, significantly outperforming baselines.

Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang• 2021

Related benchmarks

TaskDatasetResultRank
Multi-speaker DubbingV2C-Animation Dub 1.0 (test)
Speaker Similarity (SPK-SIM)54.99
12
Multi-speaker DubbingGRID Dub 1.0 (test)
SPK-SIM (%)91.06
12
Movie DubbingV2C-Animation Dub denoise 2.0
Speaker Similarity75.66
12
Video-to-Speech SynthesisGRID (test)
Sim-O0.74
11
Video-to-Speech SynthesisV2C-Animation
Sim-O14
11
Text-to-SpeechESD (test)
MOS4.02
11
Movie DubbingGRID Dubbing Setting 1.0
LSE-C5.9
10
Video-to-Speech SynthesisV2C Dub 3.0
MOS-S3.31
10
Movie DubbingGRID Dubbing Setting 2.0
LSE-C4.79
10
Text-to-SpeechVCTK (test)
MOS3.9
8
Showing 10 of 14 rows

Other info

Follow for update