Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

About

Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.

Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann (3) __INSTITUTION_6__ Department of Computer Science, Technology, Tsinghua University, Beijing, China, (2) Department of Electrical Engineering, Columbia University, NY, USA, (3) Department of Informatics, University of Hamburg, Hamburg, Germany)• 2021

Related benchmarks

Task	Dataset	Result
Speech Separation	WSJ0-2Mix (test)	SDRi (dB)18.6	160
Speech Separation	Libri2Mix (test)	SI-SNRi (dB)16.7	68
Speech Separation	WHAM! (test)	SI-SNRi (dB)14.5	58
Speech Separation	WSJ0-2Mix anechoic clean mixture (test)	SI-SNRi18.3	23
Raman Unmixing	RRUFF-2Mix	SI-SNR (dB)15.9	16
Raman Unmixing	UNIPR 2Mix	SI-SNR (dB)12.06	16
Speech Separation	VoxCeleb2-2Mix (test)	SDRi8.2	12
Audio-visual speech separation	LRS2-2Mix	SDRi10.1	12
Speech Separation	LRS3-2Mix (test)	SDRi12.8	11
Speech Separation	Libri2Mix min 16 kHz	SDR16.7	10

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord