An efficient encoder-decoder architecture with top-down attention for speech separation

About

Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.

Kai Li, Runxuan Yang, Xiaolin Hu• 2022

Related benchmarks

Task	Dataset	Result
Speech Separation	WSJ0-2Mix (test)	SDRi (dB)18.7	160
Speech Separation	Libri2Mix (test)	SI-SNRi (dB)17.4	68
Speech Separation	WSJ0-2Mix	SI-SNRi (dB)10.8	65
Speech Separation	WHAM! (test)	SI-SNRi (dB)15.2	58
Raman Unmixing	RRUFF-2Mix	SI-SNR (dB)11.74	16
Raman Unmixing	UNIPR 2Mix	SI-SNR (dB)9.89	16
Speech Separation	LRS2-2Mix (test)	GPU RTF (s) (Forward)0.0118	10
Raman spectral unmixing	21 real-world mixed samples	Mean SI-SNR (dB)8.81	5
Raman Unmixing	RRUFF-2Mix	SID0.3368	5
Audio Source Separation	LRS2-2Mix	SI-SNRi (dB)9.5	3

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord