Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

About

Current keyword spotting systems primarily use phoneme-level matching to distinguish confusable words but ignore user-specific pronunciation traits like prosody (intonation, stress, rhythm). This paper presents ProKWS, a novel framework integrating fine-grained phoneme learning with personalized prosody modeling. We design a dual-stream encoder where one stream derives robust phonemic representations through contrastive learning, while the other extracts speaker-specific prosodic patterns. A collaborative fusion module dynamically combines phonemic and prosodic information, enhancing adaptability across acoustic environments. Experiments show ProKWS delivers highly competitive performance, comparable to state-of-the-art models on standard benchmarks and demonstrates strong robustness for personalized keywords with tone and intent variations.

Jianan Pan, Yuanming Zhang, Kejie Huang• 2026

Related benchmarks

TaskDatasetResultRank
Keyword SpottingLibriPhrase Easy (LPE)
EER0.63
25
Keyword SpottingLibriPhrase Hard (LPH)
EER0.0752
20
Keyword SpottingWenet-Phrase (WPE)
AUC99.81
2
Keyword SpottingAccent-KWS (AC)
AUC71.45
2
Keyword SpottingIntent-KWS IT
AUC86.42
2
Keyword SpottingWenet-Phrase (WPH)
AUC84.82
2
Showing 6 of 6 rows

Other info

Follow for update