Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

About

Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging. In this work, we propose a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks. We propose to combine the output of several layers from the pre-trained model using trainable weights which are learned jointly with the downstream model. Further, we compare performance using two different wav2vec 2.0 models, with and without finetuning for speech recognition. We evaluate our proposed approaches on two standard emotion databases IEMOCAP and RAVDESS, showing superior performance compared to results in the literature.

Leonardo Pepino, Pablo Riera, Luciana Ferrer• 2021

Related benchmarks

Task	Dataset	Result	Rank
Speech Emotion Recognition	IEMOCAP Speaker-independent 5-fold cross-validation	WA67.2		19

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord