Triage knowledge distillation for speaker verification

About

Deploying speaker verification on resource-constrained devices remains challenging due to the computational cost of high-capacity models; knowledge distillation (KD) offers a remedy. Classical KD entangles target confidence with non-target structure in a Kullback-Leibler term, limiting the transfer of relational information. Decoupled KD separates these signals into target and non-target terms, yet treats non-targets uniformly and remains vulnerable to the long tail of low-probability classes in large-class settings. We introduce Triage KD (TRKD), a distillation scheme that operationalizes assess-prioritize-focus. TRKD introduces a cumulative-probability cutoff $\tau$ to assess per-example difficulty and partition the teacher posterior into three groups: the target class, a high-probability non-target confusion-set, and a background-set. To prioritize informative signals, TRKD distills the confusion-set conditional distribution and discards the background. Concurrently, it transfers a three-mass (target/confusion/background) that capture sample difficulty and inter-class confusion. Finally, TRKD focuses learning via a curriculum on $\tau$: training begins with a larger $\tau$ to convey broad non-target context, then $\tau$ is progressively decreased to shrink the confusion-set, concentrating supervision on the most confusable classes. In extensive experiments on VoxCeleb1 with both homogeneous and heterogeneous teacher-student pairs, TRKD was consistently superior to recent KD variants and attained the lowest EER across all protocols.

Ju-ho Kim, Youngmoon Jung, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho• 2026

Related benchmarks

Task	Dataset	Result
Speaker Verification	VoxCeleb1 (Vox1-O)	EER0.627	105
Speaker Verification	VoxCeleb1 (Vox1-H)	EER1.644	70
Speaker Verification	VoxCeleb-E	EER0.885	62

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord