Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

Neural audio codecs form the foundational building blocks for language model (LM)-based speech generation. Typically, there is a trade-off between frame rate and audio quality. This study introduces a low-frame-rate, semantically enhanced codec model. Existing approaches distill semantically rich self-supervised (SSL) representations into the first-layer codec tokens. This work proposes DualCodec, a dual-stream encoding approach that integrates SSL and waveform representations within an end-to-end codec framework. In this setting, DualCodec enhances the semantic information in the first-layer codec and enables the codec system to maintain high audio quality while operating at a low frame rate. Note that a low-frame-rate codec improves the efficiency of speech generation. Experimental results on audio codec and speech generation tasks confirm the effectiveness of the proposed DualCodec compared to state-of-the-art codec systems, such as Mimi Codec, SpeechTokenizer, DAC, and Encodec. Demos are available at: https://dualcodec.github.io, code is available at: https://github.com/jiaqili3/DualCodec

Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu• 2025

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech clean (test)
WER9.8
833
Text-to-SpeechSeed-TTS (eval)
WER5.5
39
Voice ConversionVCTK
WER21.5
21
Speech ReconstructionSeedTTS en (test)
WER0.0263
18
Speech ReconstructionSalmon Sentiment Consistency emotional 2025b (OOD)
WER3.6
18
Speech RecognitionSwitchboard
WER28.2
18
Speech ReconstructionLibriSpeech clean (test)
WER2.1
15
Text-to-SpeechLibriTTS clean (test)
WER0.1
15
Audio Encoding and Decoding EfficiencyNVIDIA A6000 Efficiency Benchmark
RTF (Encoding)0.0078
12
Speech ReconstructionJapanese Versatile Speech unseen language speech 2019 (OOD)
WER5
9
Showing 10 of 13 rows

Other info

Follow for update