Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions

About

Fixed-dimensional speaker embeddings have become the dominant approach in speaker modeling, typically spanning hundreds to thousands of dimensions. These dimensions are hyperparameters that are not specifically picked, nor are they hierarchically ordered in terms of importance. In large-scale speaker representation databases, reducing the dimensionality of embeddings can significantly lower storage and computational costs. However, directly training low-dimensional representations often yields suboptimal performance. In this paper, we introduce the Matryoshka speaker embedding, a method that allows dynamic extraction of sub-dimensions from the embedding while maintaining performance. Our approach is validated on the VoxCeleb dataset, demonstrating that it can achieve extremely low-dimensional embeddings, such as 8 dimensions, while preserving high speaker verification performance.

Shuai Wang, Pengcheng Zhu, Haizhou Li• 2024

Related benchmarks

TaskDatasetResultRank
Speaker VerificationVoxCeleb1 (Vox1-O)--
33
Speaker VerificationVOiCES (s-avg)
EER10.61
30
Speaker VerificationVOiCES 5s-1s
EER16.62
30
Speaker VerificationVOiCES f-f
EER0.0502
30
Speaker VerificationVoxCeleb Extended 1
EER (f-f)1
15
Speaker VerificationVoxCeleb Hard 1
EER (f-f)1.87
15
Showing 6 of 6 rows

Other info

Follow for update