Real Time Speech Enhancement in the Waveform Domain

About

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

Alexandre Defossez, Gabriel Synnaeve, Yossi Adi• 2020

Related benchmarks

Task	Dataset	Result
Automatic Speech Recognition	LibriSpeech clean (test)	WER6.22	1410
Speech Enhancement	VoiceBank-DEMAND (test)	PESQ3.07	201
Speech Enhancement	VoiceBank + DEMAND (VB-DMD) (test)	PESQ2.65	114
Speech Enhancement	VoiceBank-DEMAND	PESQ3.07	55
Speech Enhancement	DNS Challenge Real Recordings (test)	SIG Score3.227	41
Speech Enhancement	DNS no-reverb 2020 (test)	Signal Score (SIG)3.12	30
Automatic Speech Recognition	ATC Corpus	CER (DS2)4.28	27
Speech Enhancement	DNS Challenge Without Reverb (test)	SIG Score3.534	26
Speech Enhancement	DNS Challenge With Reverb (test)	SIG2.876	24
Speech Enhancement	Multilingual low-SNR (evaluation set)	PESQ2.57	23

Showing 10 of 47 rows

Other info

Follow for update

@wizwand_team Discord