Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

About

Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.

Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-SpeechLibriSpeech clean (test)
WER2.05
50
SpeechLM-based Text-to-SpeechLibriSpeech 960h
WER10.5
5
Showing 2 of 2 rows

Other info

Follow for update