Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

About

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.

Jiaqi Su, Zeyu Jin, Adam Finkelstein• 2020

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank + DEMAND (VB-DMD) (test)
PESQ2.94
105
Analysis-synthesisMusic Academic
FAD0.044
24
Analysis-synthesisAudio Industrial
FAD0.037
12
Analysis-synthesisMusic Industrial
FAD0.085
12
Singing Voice SynthesisSinging Voice Industrial setting
MOS Prediction3.93
11
Singing Voice SynthesisSinging Voice Academic setting
MOS Prediction Score3.84
11
Speech SynthesisSpeech Industrial Setting
MOS Prediction4.11
11
Speech SynthesisSpeech Academic Setting
MOS Prediction3.29
11
Speech DenoisingVCTK-DEMAND (test)
PESQ2.94
8
Showing 9 of 9 rows

Other info

Follow for update