MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

About

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao• 2021

Related benchmarks

Task	Dataset	Result
Speech Enhancement	VoiceBank + DEMAND (VB-DMD) (test)	PESQ3.13	114
Speech Enhancement	VoiceBank-DEMAND	PESQ3.15	55
Speech Enhancement	VB-DMD	DNSMOS3.22	15
Speech Enhancement	VCTK+DEMAND (test)	WB-PESQ3.15	13
Speech Enhancement	DNS Challenge Real-world recordings 2020	SIG2.88	11
Speech Enhancement	DNS Challenge 2020 (blind test)	WV-MOS1.23	11
Speech Enhancement	DNS Challenge 2020 (test)	DNSMOS Score3.26	9
Speech Enhancement	WSJ0-CHiME3 matched condition (test)	POLQA3.52	8
Speech Enhancement	WSJ0 mismatched condition CHiME3 (test)	POLQA2.47	7
Speech Enhancement	VoiceBank-DEMAND Noise-only (test)	PESQ3.13	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord