Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

About

Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning approaches directly in the audio domain but currently do not perform well in the downstream tasks. In this paper, we present a novel self-supervised learning method for transformer-based audio models, called masked spectrogram prediction (MaskSpec), to learn powerful audio representations from unlabeled audio data (AudioSet used in this paper). Our method masks random patches of the input spectrogram and reconstructs the masked regions with an encoder-decoder architecture. Without using extra model weights or supervision, experimental results on multiple downstream datasets demonstrate MaskSpec achieves a significant performance gain against the supervised methods and outperforms the previous pre-trained models. In particular, our best model reaches the performance of 0.471 (mAP) on AudioSet, 0.854 (mAP) on OpenMIC2018, 0.982 (accuracy) on ESC-50, 0.976 (accuracy) on SCV2, and 0.823 (accuracy) on DCASE2019 Task1A respectively.

Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng• 2022

Related benchmarks

Task	Dataset	Result
Audio Classification	ESC-50	Accuracy90.7	461
Audio Classification	AudioSet 20K	mAP34.7	151
Audio Classification	ESC-50 (test)	Accuracy90.7	111
Audio Classification	AudioSet 2M	mAP47.1	102
Audio Classification	ESC50	Top-1 Acc89.6	73
Audio Classification	SPC V2	Accuracy97.7	65
Keyword Spotting	Speech Commands V2	Accuracy97.7	61
Audio Classification	Speech Commands V2 (test)	Accuracy97.7	59
Classification	AudioSet (test)	mAP47.1	57
Audio Classification	US8K (test)	R@1 Accuracy0.896	56

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord