TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants

About

Speech enhancement (SE) is critical for improving speech intelligibility and quality in real-world environments, particularly for cochlear implant (CI) users who experience severe degradations in speech understanding under noisy and reverberant conditions. In this study, we propose TokenSE, a discrete token-based SE framework operating in the neural audio codec space, which predicts clean codec token indices from degraded speech using a Mamba-based model. Unlike the earlier Transformer architecture, whose self-attention mechanism has a computational complexity that grows quadratically with sequence length, the input-dependent selection mechanism of Mamba achieves linear complexity, making it a compelling alternative to Transformers, especially for CI and hearing-aid (HA) applications. Objective evaluations show that TokenSE consistently outperforms baseline methods on both in-domain and out-of-domain datasets. Moreover, subjective listening experiments with CI users indicate clear benefit in speech intelligibility under adverse noisy and reverberant environments.

Hsin-Tien Chiang, John H. L. Hansen• 2026

Related benchmarks

Task	Dataset	Result
Speech Enhancement	DNS Challenge Real Recordings (test)	SIG Score3.49	41
Speech Enhancement	DNS Challenge Without Reverb (test)	SIG Score3.65	26
Speech Enhancement	DNS Challenge With Reverb (test)	SIG3.643	24
Speech Enhancement	TIMIT OOD, with-reverberation, T60 = 0.7s, 5 dB SNR	SIG Score3.454	3
Speech Enhancement	TIMIT noisy-only, 0 dB SNR (OOD)	SIG Score3.514	3
Speech Enhancement	TIMIT noisy-only, 5 dB SNR (OOD)	SIG Score3.486	3
Speech Enhancement	TIMIT OOD with-reverberation T60 = 0.5s 5 dB SNR	SIG Score3.505	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord