Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement

About

In contrast to other sequence tasks modeling hidden layer features with three axes, Dual-Path time and time-frequency domain speech enhancement models are effective and have low parameters but are computationally demanding due to their hidden layer features with four axes. We propose ZipEnhancer, which is Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement, incorporating time and frequency domain Down-Up sampling to reduce computational costs. We introduce the ZipformerBlock as the core block and propose the design of the Dual-Path DownSampleStacks that symmetrically scale down and scale up. Also, we introduce the ScaleAdam optimizer and Eden learning rate scheduler to improve the performance further. Our model achieves new state-of-the-art results on the DNS 2020 Challenge and Voicebank+DEMAND datasets, with a perceptual evaluation of speech quality (PESQ) of 3.69 and 3.63, using 2.04M parameters and 62.41G FLOPS, outperforming other methods with similar complexity levels.

Haoxu Wang, Biao Tian• 2025

Related benchmarks

TaskDatasetResultRank
Composite Denoising and DereverberationWSJ0+WHAMR! (test)
WB-PESQ2.401
5
Composite Denoising, Dereverberation, and Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ2.169
5
Speech Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ3.486
5
Speech DenoisingWSJ0+WHAMR! (test)
WB-PESQ2.717
5
Speech DereverberationWSJ0+WHAMR! (test)
WB-PESQ3.501
5
Showing 5 of 5 rows

Other info

Follow for update