Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

About

We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently restricts performance, other factors, such as model capacity, also hinder further improvements. To address this problem, we scale up the model size to 159M parameters that is more than 10 times larger than popular codecs with about 10M parameters. Besides, we integrate sequential models into traditional convolutional architectures to better capture temporal dependency and adopt low-dimensional vector quantization to ensure a high code utilization. Comprehensive objective and subjective evaluations show that BigCodec, with a bitrate of 1.04 kbps, significantly outperforms several existing low-bitrate codecs. Furthermore, BigCodec achieves objective performance comparable to popular codecs operating at 4-6 times higher bitrates, and even delivers better subjective perceptual quality than the ground truth.

Detai Xin, Xu Tan, Shinnosuke Takamichi, Hiroshi Saruwatari• 2024

Related benchmarks

TaskDatasetResultRank
Speech ReconstructionLibriTTS clean (test)
PESQ2.7
50
Speech ReconstructionLibrispeech (test-clean)
STOI0.93
49
Audio ReconstructionMusicDB (test)--
28
Speech ReconstructionLibriSpeech English (test-clean)
SIM0.84
27
Speech ReconstructionAISHELL-2 Chinese
SIM0.69
27
Analysis-synthesisMusic Academic
FAD0.033
24
Audio SynthesisSinging Voice Academic setting
MOS Prediction Score4.17
21
Audio SynthesisSinging Voice Industrial setting
MOS Prediction4.32
21
Audio SynthesisSinging Voice MUSHRA (evaluation)
MUSHRA Score81.14
21
Speech ReconstructionSeedTTS en (test)
WER0.0325
18
Showing 10 of 21 rows

Other info

Follow for update