Structural and Statistical Audio Texture Knowledge Distillation for Acoustic Classification
About
While knowledge distillation has shown success in various audio tasks, its application to environmental sound classification often overlooks essential low-level audio texture features needed to capture local patterns in complex acoustic environments. To address this gap, the Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) framework is proposed, which combines high-level contextual information with low-level structural and statistical audio textures extracted from intermediate layers. To evaluate its generalizability across diverse acoustic domains, SSATKD is tested on four datasets within the environmental sound classification domain, including two passive sonar datasets (DeepShip and Vessel Type Underwater Acoustic Data (VTUAD)) and two general environmental sound datasets (Environmental Sound Classification 50 (ESC-50) and Tampere University of Technology (TUT) Acoustic Scenes). Two teacher adaptation strategies are explored: classifier-head-only adaptation and full fine-tuning. The framework is further evaluated using various convolutional and transformer-based teacher models. Experimental results demonstrate consistent accuracy improvements across all datasets and settings, confirming the effectiveness and robustness of SSATKD in real-world sound classification tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Audio Classification | ESC-50 | Accuracy81.17 | 374 | |
| Acoustic Scene Classification | TUT Acoustic Scenes | Accuracy63.54 | 35 | |
| Underwater Acoustic Target Recognition | DeepShip | OA67.48 | 16 | |
| Underwater Acoustic Classification | VTUAD | Classification Accuracy86.87 | 13 | |
| Classification | DeepShip | -- | 7 | |
| Classification | VTUAD | -- | 7 | |
| Acoustic Classification | DeepShip, VTUAD, ESC-50, TUT Acoustic Scenes Average | Average Gain6.98 | 6 | |
| Acoustic Classification | DeepShip | Accuracy65.49 | 6 | |
| Acoustic Classification | TUT | Classification Accuracy62.18 | 6 |