Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

About

Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any adaptation time and parameters. Previous researches usually use a speaker encoder to extract a global fixed speaker embedding from reference speech, and several attempts have tried variable-length speaker embedding. However, they neglect to transfer the personal pronunciation characteristics related to phoneme content, leading to poor speaker similarity in terms of detailed speaking styles and pronunciation habits. To improve the ability of the speaker encoder to model personal pronunciation characteristics, we propose content-dependent fine-grained speaker embedding for zero-shot speaker adaptation. The corresponding local content embeddings and speaker embeddings are extracted from a reference speech, respectively. Instead of modeling the temporal relations, a reference attention module is introduced to model the content relevance between the reference speech and the input text, and to generate the fine-grained speaker embedding for each phoneme encoder output. The experimental results show that our proposed method can improve speaker similarity of synthesized speeches, especially for unseen speakers.

Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng• 2022

Related benchmarks

TaskDatasetResultRank
Multi-speaker DubbingGRID Dub 1.0 (test)
SPK-SIM (%)86.54
12
Movie DubbingV2C-Animation Dub denoise 2.0
Speaker Similarity47.79
12
Multi-speaker DubbingV2C-Animation Dub 1.0 (test)
Speaker Similarity (SPK-SIM)48.98
12
Video-to-Speech SynthesisGRID (test)
Sim-O0.7
11
Video-to-Speech SynthesisV2C-Animation
Sim-O13
11
Video-to-Speech SynthesisV2C Dub 3.0
MOS-S3.62
10
Movie DubbingGRID Dubbing Setting 1.0
LSE-C5.03
10
Movie DubbingGRID Dubbing Setting 2.0
LSE-C4.48
10
Showing 8 of 8 rows

Other info

Follow for update