Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

About

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao• 2021

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank + DEMAND (VB-DMD) (test)
PESQ3.13
105
Speech EnhancementVoiceBank-DEMAND
PESQ3.15
17
Speech EnhancementVCTK+DEMAND (test)
WB-PESQ3.15
13
Speech EnhancementDNS Challenge Real-world recordings 2020
SIG2.88
11
Speech EnhancementDNS Challenge 2020 (test)
DNSMOS Score3.26
9
Speech EnhancementWSJ0-CHiME3 matched condition (test)
POLQA3.52
8
Speech EnhancementWSJ0 mismatched condition CHiME3 (test)
POLQA2.47
7
Showing 7 of 7 rows

Other info

Follow for update