MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

About

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced speech enhancement methods.

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling• 2023

Related benchmarks

Task	Dataset	Result
Speech Enhancement	VoiceBank-DEMAND (test)	PESQ3.5	201
Speech Enhancement	VoiceBank + DEMAND (VB-DMD) (test)	PESQ3.5	114
Speech Enhancement	VoiceBank-DEMAND	PESQ3.5	55
Speech Enhancement	WSJ0 UNI	PESQ2.71	15
General Speech Restoration	URGENT 2025 (test)	UTMOS1.71	14
Speech Enhancement	VCTK+DEMAND (test)	WB-PESQ3.5	13
Speech Denoising	VBDMD (test)	PESQ3.5	12
Speech Enhancement	DNS non-blind 2020 (test)	SI-SNR21.03	12
Speech Denoising	VCTK+DEMAND	PESQ3.61	11
Speech Super-resolution	VBDMD-SR (test)	PESQ3.79	10

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord