A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

About

Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research efforts have yielded promising PSE mod-els, albeit often accompanied by computationally intensivearchitectures, unsuitable for resource-constrained embeddeddevices. In this paper, we introduce a novel method to per-sonalize a lightweight dual-stage Speech Enhancement (SE)model and implement it within DeepFilterNet2, a SE modelrenowned for its state-of-the-art performance. We seek anoptimal integration of speaker information within the model,exploring different positions for the integration of the speakerembeddings within the dual-stage enhancement architec-ture. We also investigate a tailored training strategy whenadapting DeepFilterNet2 to a PSE task. We show that ourpersonalization method greatly improves the performancesof DeepFilterNet2 while preserving minimal computationaloverhead.

Thomas Serre, Mathieu Fontaine, \'Eric Benhaim, Geoffroy Dutour, Slim Essid• 2024

Related benchmarks

Task	Dataset	Result	Rank
Personalized Speech Enhancement	DNS Track 1: Headset 5 (test)	SIG Score3.77		19
Personalized Speech Enhancement	DNS Track 2: Speakerphone Blind 5 (test)	SIG Score3.58		19

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord