Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Exploring Self-Attention Mechanisms for Speech Separation

About

Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains state-of-the-art performance in speech separation with the WSJ0-2/3 Mix datasets. This paper studies in-depth Transformers for speech separation. In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!. Moreover, we extend our model to perform speech enhancement and provide experimental evidence on denoising and dereverberation tasks. Finally, we investigate, for the first time in speech separation, the use of efficient self-attention mechanisms such as Linformers, Lonformers, and ReFormers. We found that they reduce memory requirements significantly. For example, we show that the Reformer-based attention outperforms the popular Conv-TasNet model on the WSJ0-2Mix dataset while being faster at inference and comparable in terms of memory consumption.

Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi• 2022

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix
SI-SNRi (dB)21.6
65
Speech SeparationWHAM! (test)
SI-SNRi (dB)16.4
58
Speech SeparationWHAMR! (test)
ΔSI-SNR14
57
Speech SeparationWSJ0-3mix (test)
SI-SNRi19.5
29
Speech SeparationWSJ0-2Mix anechoic clean mixture (test)
SI-SNRi22.3
23
Source SeparationWSJ0-2Mix (test)
SI-SNRi22.3
17
Speech EnhancementVoiceBank-DEMAND
PESQ3.03
17
Speech SeparationWHAMR! 1CH
SI-SNRi (dB)14
11
Source SeparationLibri2Mix Clean (test)
SI-SNRi20.6
7
Source SeparationLibri2Mix Noisy (test)
SI-SNRi15.9
7
Showing 10 of 15 rows

Other info

Code

Follow for update