Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

About

Open-vocabulary keyword spotting (OV-KWS) enables personalized device control via arbitrary voice commands. Recently, researchers have explored using audio-text joint embeddings, allowing users to enroll phrases with text, and proposed techniques to disambiguate similar utterances. We find that existing OV-KWS solutions often overly bias the beginning phonemes of an enrollment, causing false triggers when negative enrollment-query-pairs share a prefix (``turn the volume up'' vs. ``turn the volume down''). We trace this to two factors: training data bias and position-biased cross-modal scoring. To address these limitations, we introduce the Partial Overlap Benchmark (POB) with two datasets, POB-Spark and POB-LibriPhrase (POB-LP), containing mismatched audio-text pairs with shared prefixes, and propose Equal-weighting Position Scoring (EPS), a lightweight decision layer. Using EPS alone reduces EER on POB-Spark from 64.4\% to 29.3\% and improves POB-LP accuracy from 87.6\% to 96.8\%, while maintaining performance on LibriPhrase and Google Speech Commands (GSC). With POB data added in training, our work achieves the best POB benchmark results while incurring the least amount of degradation on prior metrics among baselines. This degradation is most pronounced in GSC, which contains only one-word commands. We surface mitigating this trade-off as future work.

Yi Liu, Chuan-Che Huang, Xiao Quan• 2026

Related benchmarks

TaskDatasetResultRank
Open-vocabulary keyword spottingLibriPhrase easy
EER0.0182
6
Open-vocabulary keyword spottingLibriPhrase hard
EER13.7
6
Open-vocabulary keyword spottingPOB-Spark
EER16.15
6
Open-vocabulary keyword spottingGoogle Speech Commands (GSC)
EER8.87
6
Open-vocabulary keyword spottingPOB-LP
Accuracy99.42
6
Showing 5 of 5 rows

Other info

Follow for update