Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR

About

We propose a self-speaker adaptation method for streaming multi-talker automatic speech recognition (ASR) that eliminates the need for explicit speaker queries. Unlike conventional approaches requiring target speaker embeddings or enrollment audio, our technique dynamically adapts individual ASR instances through speaker-wise speech activity prediction. The key innovation involves injecting speaker-specific kernels generated via speaker supervision activations into selected ASR encoder layers. This enables instantaneous speaker adaptation to target speakers while handling fully overlapped speech even in a streaming scenario. Experiments show state-of-the-art performance in both offline and streaming scenarios, demonstrating that our self-adaptive method effectively addresses severe speech overlap through streamlined speaker-focused recognition. The results validate the proposed self-speaker adaptation approach as a robust solution for multi-talker ASR under severe overlapping speech conditions.

Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg• 2025

Related benchmarks

TaskDatasetResultRank
Multi-speaker Automatic Speech RecognitionAMI
CP-WER24.62
11
Speaker-attributed Automatic Speech RecognitionFisher (local setting)--
4
Speaker-attributed Automatic Speech RecognitionMLC local setting--
4
Speaker-attributed Automatic Speech RecognitionCandor (local setting)--
4
Speaker-attributed Automatic Speech RecognitionFisher Global Meeting-level--
4
Speaker-attributed Automatic Speech RecognitionMLC Global Meeting-level--
4
Speaker-attributed Automatic Speech RecognitionCandor Global Meeting-level--
4
Showing 7 of 7 rows

Other info

Follow for update