MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
About
This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced speech enhancement methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Enhancement | VoiceBank + DEMAND (VB-DMD) (test) | PESQ3.5 | 105 | |
| Speech Enhancement | WSJ0 UNI | PESQ2.71 | 15 | |
| Speech Enhancement | VCTK+DEMAND (test) | WB-PESQ3.5 | 13 | |
| Speech Denoising | VBDMD (test) | PESQ3.5 | 12 | |
| Speech Super-resolution | VBDMD-SR (test) | PESQ3.79 | 10 | |
| Speech Denoising | VoiceBank+DEMAND (test) | PESQ3.496 | 7 | |
| Speech Enhancement | GRID and DEMAND Station noise (test) | SDR-14.54 | 6 | |
| Speech Enhancement | GRID and DEMAND Kitchen noise (test) | SDR-14.63 | 6 | |
| Speech Enhancement | GRID and DEMAND Metro noise (test) | SDR-13.91 | 6 | |
| Speech Enhancement | GRID and DEMAND Cafeteria noise (test) | SDR-14.6 | 6 |