Audio Barlow Twins: Self-Supervised Audio Representation Learning
About
The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021 Challenge, achieving results which outperform, or otherwise are on a par with, the current state-of-the-art for instance discrimination self-supervised learning approaches to audio representation learning. Code at https://github.com/jonahanton/SSL_audio.
Jonah Anton, Harry Coppock, Pancham Shukla, Bjorn W.Schuller• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sound Event Detection | DCASE HEAR challenge | Onset FMS76.1 | 20 | |
| Audio Scene Classification | HEAR Music 2021 | Beijing0.966 | 5 | |
| Music Transcription | MAESTRO | Onset FMS0.048 | 5 | |
| Scene-based Audio Classification | HEAR Environmental Sound tasks | ESC-50 Accuracy78.6 | 5 | |
| Scene-based Audio Classification | HEAR Speech tasks | CREMA-D Score0.594 | 5 |
Showing 5 of 5 rows