Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Loose coupling of spectral and spatial models for multi-channel diarization and enhancement of meetings in dynamic environments

About

Sound capture by microphone arrays opens the possibility to exploit spatial, in addition to spectral, information for diarization and signal enhancement, two important tasks in meeting transcription. However, there is no one-to-one mapping of positions in space to speakers if speakers move. Here, we address this by proposing a novel joint spatial and spectral mixture model, whose two submodels are loosely coupled by modeling the relationship between speaker and position index probabilistically. Thus, spatial and spectral information can be jointly exploited, while at the same time allowing for speakers speaking from different positions. Experiments on the LibriCSS data set with simulated speaker position changes show great improvements over tightly coupled subsystems.

Adrian Meise, Tobias Cord-Landwehr, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach• 2026

Related benchmarks

TaskDatasetResultRank
Joint Diarization and Speech SeparationLibriCSS concatenated segments (speaker relocation scenario)
cpWER (0S)5
5
Joint Diarization and Speech SeparationLibriCSS concatenated segments static scenario
cpWER (0S)5
5
Meeting RecognitionLibriCSS individual segments
Error Rate (0S)4.7
4
Showing 3 of 3 rows

Other info

Follow for update