Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model

About

Speech enhancement remains challenging due to the trade-off between efficiency and perceptual quality. In this paper, we introduce MAGE, a Masked Audio Generative Enhancer that advances generative speech enhancement through a compact and robust design. Unlike prior masked generative models with random masking, MAGE employs a scarcity-aware coarse-to-fine masking strategy that prioritizes frequent tokens in early steps and rare tokens in later refinements, improving efficiency and generalization. We also propose a lightweight corrector module that further stabilizes inference by detecting low-confidence predictions and re-masking them for refinement. Built on BigCodec and finetuned from Qwen2.5-0.5B, MAGE is reduced to 200M parameters through selective layer retention. Experiments on DNS Challenge and noisy LibriSpeech show that MAGE achieves state-of-the-art perceptual quality and significantly reduces word error rate for downstream recognition, outperforming larger baselines. Audio examples are available at https://hieugiaosu.github.io/MAGE/.

The Hieu Pham, Tan Dat Nguyen, Phuong Thanh Tran, Joon Son Chung, Duc Dung Nguyen• 2025

Related benchmarks

TaskDatasetResultRank
Speech EnhancementDNS Challenge Real Recordings (test)
SIG Score4.206
32
Speech EnhancementDNS Challenge With Reverb (test)
SIG3.876
24
Automatic Speech RecognitionLibriSpeech noisy (test)
WER0.2345
5
Speech EnhancementLibriSpeech noisy (test)
SIG Score4.517
5
Showing 4 of 4 rows

Other info

Follow for update