Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

About

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude masking architecture and a phase parallel estimation architecture, respectively. Multi-level loss functions explicitly defined on the magnitude spectra, wrapped phase spectra, and short-time complex spectra are adopted to jointly train the MP-SENet model. A metric discriminator is further employed to compensate for the incomplete correlation between these losses and human auditory perception. Experimental results demonstrate that our proposed MP-SENet achieves state-of-the-art performance across multiple speech enhancement tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling• 2023

Related benchmarks

TaskDatasetResultRank
Speech EnhancementDNS Challenge Without Reverb (test)
NB-PESQ3.92
14
Phase RetrievalVoiceBank Corpus (test)
PESQ4.6
8
Speech DenoisingVoiceBank+DEMAND (test)
PESQ3.604
7
Speech DereverberationREVERB Challenge SimData (test)
CD1.97
7
Speech DereverberationREVERB Challenge Evaluation Set (RealData)
SRMR6.67
7
Speech Bandwidth ExtensionVCTK 8 kHz -> 16 kHz (test)
WB-PESQ4.28
6
Speech Bandwidth ExtensionVCTK 4 kHz -> 16 kHz (test)
WB-PESQ3.78
6
Speech DenoisingDNS Non-Reverberant 2020 (test)
PESQ2.79
5
Composite Denoising, Dereverberation, and Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ2.103
5
Speech Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ3.322
5
Showing 10 of 13 rows

Other info

Code

Follow for update