EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

About

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.

Jan Schl\"uter, Gerald Gutenbrunner• 2022

Related benchmarks

Task	Dataset	Result
Musical Instrument Classification	NSynth	Accuracy71.7	123
Audio Classification	CREMA-D	Accuracy60.2	26
Audio Classification	NSynth Pitch	Accuracy92.7	8
Audio Classification	SpeechCommands v1 v2 (test)	Accuracy95.3	5
Audio Classification	BirdCLEF 2021	Accuracy42.9	5
Audio Classification	VoxForge	Accuracy91.4	5

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord