Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

About

Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more generalisable. Specifically, CLAP-style models are able to `answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for general audio tasks, like captioning or sound event detection, there are no datasets with matched audio and text queries for computational paralinguistic (CP) tasks. As a result, the community relies on generic CLAP models trained for general audio with limited success. In the present study, we explore training considerations for ParaCLAP, a CLAP-style model suited to CP, including a novel process for creating audio-language queries. We demonstrate its effectiveness on a set of computational paralinguistic tasks, where it is shown to surpass the performance of open-source state-of-the-art models.

Xin Jing, Andreas Triantafyllopoulos, Bj\"orn Schuller• 2024

Related benchmarks

TaskDatasetResultRank
Emotion RecognitionIEMOCAP--
71
Speech Emotion RecognitionRAVDESS
Weighted Accuracy28.1
19
Language ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Balanced Acc0.2
19
Age ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Acc (B)0.108
19
Emotion ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Accuracy (B)0.092
19
Gender ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Balanced Acc0.097
19
Emotion RecognitionCREMA-D
WA (Weighted Average)29.8
12
Affective ComputingFAU Aibo 2cl/de (test)
UAR53.7
8
Gender ClassificationRAVDESS
Weighted Accuracy99.2
5
Age ClassificationCREMA-D
WA31.2
5
Showing 10 of 21 rows

Other info

Follow for update