Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

About

We propose an approach for simultaneous diarization and separation of meeting data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM) for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for diarization in a joint statistical framework. Through the integration, both spatial and spectral information are exploited for diarization and separation. We also develop a method for counting the number of active speakers in a segment of a meeting to support block-wise processing. While the total number of speakers in a meeting may be known, it is usually not known on a per-segment level. With the proposed speaker counting, joint diarization and source separation can be done segment-by-segment, and the permutation problem across segments is solved, thus allowing for block-online processing in the future. Experimental results on the LibriCSS meeting corpus show that the integrated approach outperforms a cascaded approach of diarization and speech enhancement in terms of WER, both on a per-segment and on a per-meeting level.

Tobias Cord-Landwehr, Christoph Boeddeker, Reinhold Haeb-Umbach• 2024

Related benchmarks

TaskDatasetResultRank
Joint Diarization and Speech SeparationLibriCSS concatenated segments static scenario
cpWER (0S)4.2
5
Joint Diarization and Speech SeparationLibriCSS concatenated segments (speaker relocation scenario)
cpWER (0S)17.2
5
Meeting RecognitionLibriCSS individual segments
Error Rate (0S)4.3
4
Showing 3 of 3 rows

Other info

Follow for update