Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

About

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher computational cost and have limited high-frequency fidelity. Neural audio codecs offer compact latent representations that better preserve acoustic detail, yet accurately recovering high-resolution latent information remains challenging due to representation mismatch. We present CodecFlow, a neural codec-based BWE framework that performs efficient speech reconstruction in a compact latent space. CodecFlow employs a voicing-aware conditional flow converter on continuous codec embeddings and a structure-constrained residual vector quantizer to improve latent alignment stability. Optimized end-to-end, CodecFlow achieves strong spectral fidelity and enhanced perceptual quality on 8 kHz to 16 kHz and 44.1 kHz speech BWE tasks.

Bowen Zhang, Junchuan Zhao, Ian McLoughlin, Ye Wang, A S Madhukumar• 2026

Related benchmarks

TaskDatasetResultRank
Bandwidth extensionTIMIT 8 kHz to 16 kHz (test)
VISQOL2.72
10
Bandwidth extensionVCTK 8 kHz to 44.1 kHz (test)
VISQOL3.3
10
Showing 2 of 2 rows

Other info

Follow for update