Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

About

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The proposed AP-BWE generator is entirely based on convolutional neural networks (CNNs). It features a dual-stream architecture with mutual interaction, where the amplitude stream and the phase stream communicate with each other and respectively extend the high-frequency components from the input narrowband amplitude and phase spectra. To improve the naturalness of the extended speech signals, we employ a multi-period discriminator at the waveform level and design a pair of multi-resolution amplitude and phase discriminators at the spectral level, respectively. Experimental results demonstrate that our proposed AP-BWE achieves state-of-the-art performance in terms of speech quality for BWE tasks targeting sampling rates of both 16 kHz and 48 kHz. In terms of generation efficiency, due to the all-convolutional architecture and all-frame-level operations, the proposed AP-BWE can generate 48 kHz waveform samples 292.3 times faster than real-time on a single RTX 4090 GPU and 18.1 times faster than real-time on a single CPU. Notably, to our knowledge, AP-BWE is the first to achieve the direct extension of the high-frequency phase spectrum, which is beneficial for improving the effectiveness of existing BWE methods.

Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling• 2024

Related benchmarks

Task	Dataset	Result
Audio Super-Resolution	VCTK In-domain	LSD0.63	34
Audio Bandwidth Extension	Diverse (ITU-T P.501, ETSI TS 103 281 Annex E, NTT) (evaluation set)	2f-Model Score35.21	28
Speech Bandwidth Extension	VCTK English	NISQA-MOS4.49	15
Speech Bandwidth Extension	VCTK noisy (test)	NISQA-MOS3.71	12
Speech Bandwidth Extension	MLS French	NISQA-MOS3.42	10
Bandwidth extension	VCTK 8 kHz to 44.1 kHz (test)	VISQOL3.23	10
Bandwidth extension	TIMIT 8 kHz to 16 kHz (test)	VISQOL2.42	10
Speech Bandwidth Extension	MLS French noisy (test)	NISQA-MOS2.19	8
Speech Super-resolution	VCTK Clean	LSD (8->16kHz)0.9	7
Bandwidth extension	VCTK 4 kHz -> 48 kHz v0.92 (test)	LSD0.9553	6

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord