Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AudioProtoPNet: An interpretable deep learning model for bird sound classification

About

Deep learning models have significantly advanced acoustic bird monitoring by being able to recognize numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings, with the classification layer replaced by a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species' vocalizations from spectrograms of training instances. During inference, audio recordings are classified by comparing them to the learned prototypes in the embedding space, providing explanations for the model's decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings. Its performance was evaluated on the seven test datasets of BirdSet, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art model Perch, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet inherently interpretable deep learning models that provide valuable insights for ornithologists and machine learning engineers.

Ren\'e Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz• 2024

Related benchmarks

TaskDatasetResultRank
Audio Deepfake DetectionWaveFake MelGAN (test)
EER0.00e+0
63
Multi-label bioacoustic classificationBirdSet POW
cmAP52
57
Multi-label bioacoustic classificationBirdSet PER
cmAP30
57
Multi-label bioacoustic classificationBirdSet HSN
cmAP55
57
Audio Deepfake DetectionWaveFake Average (test)
aEER0.6
21
Audio Deepfake DetectionWaveFake MelGAN (L) (test)
EER0.00e+0
21
Audio Deepfake DetectionWaveFake HiFi-GAN (test)
EER0.00e+0
21
Audio Deepfake DetectionWaveFake PWG (test)
EER0.00e+0
21
Audio Deepfake DetectionWaveFake WaveGlow (test)
EER0.00e+0
21
Multi-label bioacoustic classificationBirdSet UHH
cmAP32
3
Showing 10 of 14 rows

Other info

Follow for update