Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

About

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced speech enhancement methods.

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling• 2023

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank + DEMAND (VB-DMD) (test)
PESQ3.5
105
Speech EnhancementWSJ0 UNI
PESQ2.71
15
Speech EnhancementVCTK+DEMAND (test)
WB-PESQ3.5
13
Speech DenoisingVBDMD (test)
PESQ3.5
12
Speech Super-resolutionVBDMD-SR (test)
PESQ3.79
10
Speech DenoisingVoiceBank+DEMAND (test)
PESQ3.496
7
Speech EnhancementGRID and DEMAND Station noise (test)
SDR-14.54
6
Speech EnhancementGRID and DEMAND Kitchen noise (test)
SDR-14.63
6
Speech EnhancementGRID and DEMAND Metro noise (test)
SDR-13.91
6
Speech EnhancementGRID and DEMAND Cafeteria noise (test)
SDR-14.6
6
Showing 10 of 16 rows

Other info

Follow for update