High Fidelity Speech Enhancement with Band-split RNN

About

Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging. In this paper, we extend the application of the recently proposed band-split RNN (BSRNN) model to full-band SE and personalized SE (PSE) tasks. To mitigate the effects of unstable high-frequency components in full-band speech, we perform bi-directional and uni-directional band-level modeling to low-frequency and high-frequency subbands, respectively. For PSE task, we incorporate a speaker enrollment module into BSRNN to utilize target speaker information. Moreover, we utilize a MetricGAN discriminator (MGD) and a multi-resolution spectrogram discriminator (MRSD) to improve perceptual quality metrics. Experimental results show that our system outperforms various top-ranking SE systems, achieves state-of-the-art (SOTA) results on the DNS-2020 test set and ranks among the top 3 in the DNS-2023 challenge.

Jianwei Yu, Yi Luo, Hangting Chen, Rongzhi Gu, Chao Weng• 2022

Related benchmarks

Task	Dataset	Result
Speech Enhancement	VoiceBank-DEMAND (test)	PESQ3.1	201
Speech Enhancement	DNS no-reverb 2020 (test)	--	30
Automatic Speech Recognition	ATC Corpus	CER (DS2)5.66	27
Speech Enhancement	ATC Corpus	CSIG4.55	19
Speech Enhancement	ATC Corpus (selected samples)	MOS SIG3.85	18
Speech Enhancement	DNS with reverb 2020 (test)	PESQ-WB3.72	16
Automatic Speech Recognition	Artificial Dataset Additive Noise	CER6.73	14
Automatic Speech Recognition	Artificial Dataset Simulative Echo	CER8.55	14
Personalized Speech Enhancement	DNS Without Interference 2022 (test)	SIG Score4.3	8
Personalized Speech Enhancement	DNS blind (With Interference) 2022 (test)	SIG Score3.9	8

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord