Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

About

We propose an approach for simultaneous diarization and separation of meeting data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM) for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for diarization in a joint statistical framework. Through the integration, both spatial and spectral information are exploited for diarization and separation. We also develop a method for counting the number of active speakers in a segment of a meeting to support block-wise processing. While the total number of speakers in a meeting may be known, it is usually not known on a per-segment level. With the proposed speaker counting, joint diarization and source separation can be done segment-by-segment, and the permutation problem across segments is solved, thus allowing for block-online processing in the future. Experimental results on the LibriCSS meeting corpus show that the integrated approach outperforms a cascaded approach of diarization and speech enhancement in terms of WER, both on a per-segment and on a per-meeting level.

Tobias Cord-Landwehr, Christoph Boeddeker, Reinhold Haeb-Umbach• 2024

Related benchmarks

Task	Dataset	Result
Joint Diarization and Speech Separation	LibriCSS concatenated segments static scenario	cpWER (0S)4.2	5
Joint Diarization and Speech Separation	LibriCSS concatenated segments (speaker relocation scenario)	cpWER (0S)17.2	5
Meeting Recognition	LibriCSS individual segments	Error Rate (0S)4.3	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord