Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

About

Prevalent semantic speech tokenizers, designed to capture linguistic content, are surprisingly fragile. We find they are not robust to meaning-irrelevant acoustic perturbations; even at high Signal-to-Noise Ratios (SNRs) where speech is perfectly intelligible, their output token sequences can change drastically, increasing the learning burden for downstream LLMs. This instability stems from two flaws: a brittle single-path quantization architecture and a distant training signal indifferent to intermediate token stability. To address this, we introduce StableToken, a tokenizer that achieves stability through a consensus-driven mechanism. Its multi-branch architecture processes audio in parallel, and these representations are merged via a powerful bit-wise voting mechanism to form a single, stable token sequence. StableToken sets a new state-of-the-art in token stability, drastically reducing Unit Edit Distance (UED) under diverse noise conditions. This foundational stability translates directly to downstream benefits, significantly improving the robustness of SpeechLLMs on a variety of tasks. Our code and model are publicly available at https://github.com/Tencent/StableToken.

Yuhan Song, Linhao Zhang, Chuhan Wu, Aiwei Liu, Wei Jia, Houfeng Wang, Xiao Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Speech ReconstructionLibrispeech (test-clean)
UT MOS4.09
64
Audio UnderstandingMMSU
Perception Score31.98
37
Speech ReconstructionLibriSpeech clean (test)
WER3.84
25
Audio ReconstructionSeed EN
Word Error Rate (WER)3.44
20
Audio ReconstructionSeed-ZH
WER2.62
15
Noise RobustnessFLEURS (test)
Robustness Score (Gaussian Noise)12.93
15
Audio UnderstandingMMAU
Overall Score53.2
14
Audio ReconstructionLibriSpeech Clean
WER3.84
11
Audio ReconstructionLibriSpeech Other
WER7.99
11
Speech ReconstructionLibrispeech other (test)
WER7.99
9
Showing 10 of 14 rows

Other info

Follow for update