Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

About

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.

Seungu Han, Junhyeok Lee• 2022

Related benchmarks

TaskDatasetResultRank
Audio Super-ResolutionVCTK In-domain
LSD0.73
34
Bandwidth extensionTIMIT 8 kHz to 16 kHz (test)
VISQOL2.6
10
Bandwidth extensionVCTK 8 kHz to 44.1 kHz (test)
VISQOL2.19
10
Bandwidth extensionVCTK-BWE BW=4K (test)
WVMOS4.169
7
Bandwidth extensionVCTK-BWE BW=2K (test)
WVMOS3.208
7
Bandwidth extensionVCTK-BWE BW=1K (test)
WVMOS1.895
6
Audio Super-ResolutionVCTK 16 kHz (test)
SNR23.31
5
Audio Super-ResolutionVCTK 24 kHz (test)
SNR27.68
5
Audio Super-ResolutionVCTK 8 kHz (test)
SNR18.43
5
Audio Super-ResolutionVCTK 12 kHz (test)
SNR20.95
5
Showing 10 of 11 rows

Other info

Follow for update