APEX: Audio Prototype EXplanations for Classification Tasks

About

Explainable AI (XAI) has achieved remarkable success in image classification, yet the audio domain lacks equally mature solutions. Current methods apply vision-based attribution techniques to spectrograms, overlooking fundamental differences between visual and acoustic signals. While prototype reasoning is promising, acoustic similarity remains multidimensional. We introduce APEX (Audio Prototype EXplanations), a post-hoc framework for interpreting pre-trained audio classifiers. Crucially, APEX requires no fine-tuning of the original backbone and strictly preserves output invariance. APEX disentangles explanations into four perspectives: Square-based prototypes to localize transient events, Time-based for temporal patterns, Frequency-based highlighting spectral bands, and Time-Frequency-based integrating both. This yields intuitive, example-based explanations that respect acoustic properties, providing greater semantic clarity than standard gradient-based methods.

Piotr Kawa, Kornel Howil, Piotr Borycki, Mi{\l}osz Adamczyk, Przemys{\l}aw Spurek, Piotr Syga• 2026

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	WaveFake MelGAN (test)	EER0.00e+0	63
Multi-label bioacoustic classification	BirdSet POW	cmAP43	57
Multi-label bioacoustic classification	BirdSet PER	cmAP23	57
Multi-label bioacoustic classification	BirdSet HSN	cmAP52	57
Audio Deepfake Detection	WaveFake MelGAN (L) (test)	EER0.00e+0	21
Audio Deepfake Detection	WaveFake HiFi-GAN (test)	EER0.00e+0	21
Audio Deepfake Detection	WaveFake PWG (test)	EER0.00e+0	21
Audio Deepfake Detection	WaveFake WaveGlow (test)	EER0.00e+0	21
Audio Deepfake Detection	WaveFake Average (test)	aEER1.8	21
Multi-label bioacoustic classification	BirdSet NES	cmAP38	3

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord