Generalized End-to-End Loss for Speaker Verification

About

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE2E loss does not require an initial stage of example selection. With these properties, our model with the new loss function decreases speaker verification EER by more than 10%, while reducing the training time by 60% at the same time. We also introduce the MultiReader technique, which allows us to do domain adaptation - training a more accurate model that supports multiple keywords (i.e. "OK Google" and "Hey Google") as well as multiple dialects.

Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno• 2017

Related benchmarks

Task	Dataset	Result
Speaker Recognition	VoxCeleb1 (test)	EER2.37	126
Speech Quality Assessment	Wild	MOS0.82	18
Speech Quality Assessment	Kids	MOS0.7	18
Speech Quality Assessment	Clean	MOS0.46	18
Speech Quality Assessment	Noisy	MOS0.4	18
Text-dependent speaker verification	Large speaker verification dataset 83K speakers (test)	Average EER2.38	6
Text-independent speaker verification	Anonymized logs 1000 speakers (test)	EER (%)3.55	3

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord