MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
About
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Enhancement | VoiceBank + DEMAND (VB-DMD) (test) | PESQ3.13 | 105 | |
| Speech Enhancement | VoiceBank-DEMAND | PESQ3.15 | 17 | |
| Speech Enhancement | VCTK+DEMAND (test) | WB-PESQ3.15 | 13 | |
| Speech Enhancement | DNS Challenge Real-world recordings 2020 | SIG2.88 | 11 | |
| Speech Enhancement | DNS Challenge 2020 (test) | DNSMOS Score3.26 | 9 | |
| Speech Enhancement | WSJ0-CHiME3 matched condition (test) | POLQA3.52 | 8 | |
| Speech Enhancement | WSJ0 mismatched condition CHiME3 (test) | POLQA2.47 | 7 |