End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network

About

While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code is available at: \href{https://github.com/Alibaba-MIIL/AudioClassfication}{this http url}

Avi Gazneli, Gadi Zimerman, Tal Ridnik, Gilad Sharir, Asaf Noy• 2022

Related benchmarks

Task	Dataset	Result
Audio Classification	ESC-50	Accuracy96.3	461
Audio Classification	ESC-50 (test)	Accuracy96.3	111
Audio Classification	AudioSet 2M	mAP42.6	102
Keyword Spotting	Speech Commands V2	Accuracy98.15	61
Audio Recognition	Speech Commands V2	Accuracy98.15	43
Sound classification	AudioSet (evaluation)	mAP42.6	39
Audio Classification	UrbanSound8K (official 10 fold split)	Accuracy (%)90	23
Audio Classification	Speech Commands 35 classes V2 (evaluation)	Accuracy98.15	9

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord