Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification

About

This report presents a dual-level knowledge distillation framework with multi-teacher guidance for low-complexity acoustic scene classification (ASC) in DCASE2025 Task 1. We propose a distillation strategy that jointly transfers both soft logits and intermediate feature representations. Specifically, we pre-trained PaSST and CP-ResNet models as teacher models. Logits from teachers are averaged to generate soft targets, while one CP-ResNet is selected for feature-level distillation. This enables the compact student model (CP-Mobile) to capture both semantic distribution and structural information from teacher guidance. Experiments on the TAU Urban Acoustic Scenes 2022 Mobile dataset (development set) demonstrate that our submitted systems achieve up to 59.30\% accuracy.

Haowen Li, Ziyi Yang, Mou Wang, Ee-Leng Tan, Junwei Yeow, Santi Peksi, Woon-Seng Gan• 2025

Related benchmarks

Task	Dataset	Result	Rank
Acoustic Scene Classification	TAU Urban Acoustic Scenes Mobile 2022 (dev)	Accuracy59.3		5

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord