FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

About

Audio super-resolution is challenging owing to its ill-posed nature. Recently, the application of diffusion models in audio super-resolution has shown promising results in alleviating this challenge. However, diffusion-based models have limitations, primarily the necessity for numerous sampling steps, which causes significantly increased latency when synthesizing high-quality audio samples. In this paper, we propose FLowHigh, a novel approach that integrates flow matching, a highly efficient generative model, into audio super-resolution. We also explore probability paths specially tailored for audio super-resolution, which effectively capture high-resolution audio distributions, thereby enhancing reconstruction quality. The proposed method generates high-fidelity, high-resolution audio through a single-step sampling process across various input sampling rates. The experimental results on the VCTK benchmark dataset demonstrate that FLowHigh achieves state-of-the-art performance in audio super-resolution, as evaluated by log-spectral distance and ViSQOL while maintaining computational efficiency with only a single-step sampling process.

Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee• 2025

Related benchmarks

Task	Dataset	Result
Audio Super-Resolution	VCTK In-domain	LSD1.17	34
Audio Super-Resolution	ESC-50 Out-of-domain	LSD1.63	16
Audio Super-Resolution	Internal Music In-domain	LSD1.43	16
Audio Super-Resolution	MUSDB18-HQ Out-of-domain	LSD1.77	16
Audio Super-Resolution	VCTK 24 kHz (test)	LSD0.74	11
Noise-Robust Bandwidth Expansion	Valentini-Botinhao noisy 8 kHz to 16 kHz (test)	LSD1.12	11
Bandwidth extension	VCTK 8 kHz to 44.1 kHz (test)	VISQOL3.49	10
Bandwidth extension	TIMIT 8 kHz to 16 kHz (test)	VISQOL2.59	10
Audio Super-Resolution	VCTK (test)	LSD3.9	7
Audio Super-Resolution	ESC-50 (test)	MOS3.18	6

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord