Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants

About

Speech enhancement (SE) is critical for improving speech intelligibility and quality in real-world environments, particularly for cochlear implant (CI) users who experience severe degradations in speech understanding under noisy and reverberant conditions. In this study, we propose TokenSE, a discrete token-based SE framework operating in the neural audio codec space, which predicts clean codec token indices from degraded speech using a Mamba-based model. Unlike the earlier Transformer architecture, whose self-attention mechanism has a computational complexity that grows quadratically with sequence length, the input-dependent selection mechanism of Mamba achieves linear complexity, making it a compelling alternative to Transformers, especially for CI and hearing-aid (HA) applications. Objective evaluations show that TokenSE consistently outperforms baseline methods on both in-domain and out-of-domain datasets. Moreover, subjective listening experiments with CI users indicate clear benefit in speech intelligibility under adverse noisy and reverberant environments.

Hsin-Tien Chiang, John H. L. Hansen• 2026

Related benchmarks

TaskDatasetResultRank
Speech EnhancementDNS Challenge Real Recordings (test)
SIG Score3.49
32
Speech EnhancementDNS Challenge With Reverb (test)
SIG3.643
24
Speech EnhancementDNS Challenge Without Reverb (test)--
14
Speech EnhancementTIMIT OOD, with-reverberation, T60 = 0.7s, 5 dB SNR
SIG Score3.454
3
Speech EnhancementTIMIT noisy-only, 0 dB SNR (OOD)
SIG Score3.514
3
Speech EnhancementTIMIT noisy-only, 5 dB SNR (OOD)
SIG Score3.486
3
Speech EnhancementTIMIT OOD with-reverberation T60 = 0.5s 5 dB SNR
SIG Score3.505
3
Showing 7 of 7 rows

Other info

Follow for update