Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec

About

Neural audio codec (NAC) is essential for reconstructing high-quality speech signals and generating discrete representations for downstream speech language models. However, ensuring accurate semantic modeling while maintaining high-fidelity reconstruction under ultra-low bitrate constraints remains challenging. We propose an entropy-guided group residual vector quantization (EG-GRVQ) for an ultra-low bitrate neural speech codec, which retains a semantic branch for linguistic information and incorporates an entropy-guided grouping strategy in the acoustic branch. Assuming that channel activations follow approximately Gaussian statistics, the variance of each channel can serve as a principled proxy for its information content. Based on this assumption, we partition the encoder output such that each group carries an equal share of the total information. This balanced allocation improves codebook efficiency and reduces redundancy. Trained on LibriTTS and VCTK, our model shows improvements in perceptual quality and intelligibility metrics under ultra-low bitrate conditions, with a focus on codec-level fidelity for communication-oriented scenarios.

Yanzhou Ren, Noboru Harada, Daiki Takeuchi, Siyu Chen, Wei Liu, Xiao Zhang, Liyuan Zhang, Takehiro Moriya, Shoji Makino• 2026

Related benchmarks

TaskDatasetResultRank
Speech ReconstructionLibriTTS clean (test)
PESQ1.881
63
Showing 1 of 1 rows

Other info

Follow for update