Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement

About

Generative Universal Speech Enhancement (USE) methods aim to leverage generative models to improve speech quality under various types of distortions. However, existing generative speech enhancement methods often suffer from semantic inconsistency in the generated outputs. Therefore, we propose SenSE, a novel two-stage generative universal speech enhancement framework, by modeling semantic priors with a language model, the flow matching-based speech enhancement process is guided to generate semantically faithful speech, thereby effectively improving context fidelity. In addition, we introduce a dual-path masked conditioning training strategy that enables flow matching-based enhancement to flexibly integrate multi-source conditioning signals from degraded speech, semantic tokens, and reference speech, thereby improving model flexibility and adaptability. Experimental results demonstrate that SenSE achieves state-of-the-art performance among generative speech enhancement models and exhibits a high performance ceiling, particularly under challenging distortion conditions. Codes and demos are available at https://github.com/ASLP-lab/SenSE.

Xingchen Li, Hanke Xie, Ziqian Wang, Zihan Zhang, Longshuai Xiao, Shuai Wang, Lei Xie• 2025

Related benchmarks

TaskDatasetResultRank
Speech EnhancementDNS1 With-Reverb (test)
DNSMOS3.37
19
Speech EnhancementDNS No-Reverb 1 (test)
DNSMOS3.38
19
Speech EnhancementLibrispeech simulated general-SNR (test)
DNSMOS3.42
11
Speech EnhancementLibrispeech simulated low-SNR (test)
DNSMOS3.42
11
Speech EnhancementDNS Challenge no-reverb
DNSMOS3.376
9
Speech EnhancementSimulated (test)
DNSMOS3.39
8
Speech EnhancementDNS Challenge HardSet
DNSMOS3.408
8
Speech EnhancementDNS Challenge GSR
DNSMOS3.388
6
Speech EnhancementVCTK GSR
DNSMOS3.109
6
Showing 9 of 9 rows

Other info

Follow for update