Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation

About

Automatic Chord Recognition (ACR) is constrained by the scarcity of aligned chord annotations, which are costly to acquire. At the same time, open-weight pre-trained models are more accessible than their proprietary training data. In this work, we present a two-stage training pipeline that leverages pre-trained models together with unlabeled audio. The proposed method decouples training into two stages. In the first stage, we use the pre-trained BTC model as a teacher to generate pseudo-labels for over 1,000 hours of diverse unlabeled audio and train a student model solely on these pseudo-labels. In the second stage, the student is continually trained on ground-truth labels as they become available. To prevent catastrophic forgetting of the representations learned in the first stage, we apply selective knowledge distillation (KD) from the teacher as a regularizer. In our experiments, two models (BTC, 2E1D) were used as students. In Stage 1, using only pseudo-labels, the BTC student achieves about 99% of the teacher's performance, while the 2E1D model achieves about 97% of the teacher's performance across seven standard mir_eval metrics. After continual training with labeled data in Stage 2, the resulting BTC student model consistently surpasses both the traditional supervised learning baseline and the original pre-trained teacher model across all metrics. The resulting 2E1D student model also outperforms the supervised baseline and approaches teacher-level performance, with both models demonstrating substantial gains on rare chord qualities.

Nghia Phan, Rong Jin, Gang Liu, Xiao Dong• 2026

Related benchmarks

Task	Dataset	Result	Rank
Automatic Chord Estimation	ACR (test)	Root Accuracy83.03		7

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord