FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates

About

Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-rate audio codecs, but even lower frame rate codecs remain underexplored. We find that a major challenge for very low frame rate tokens is missing semantic information. This paper introduces FlexiCodec to address this limitation. FlexiCodec improves semantic preservation with a dynamic frame rate approach and introduces a novel architecture featuring an ASR feature-assisted dual stream encoding and Transformer bottlenecks. With dynamic frame rates, it uses less frames at information-sparse regions through adaptively merging semantically similar frames. A dynamic frame rate also allows FlexiCodec to support inference-time controllable frame rates between 3Hz and 12.5Hz. Experiments on 6.25Hz, 8.3Hz and 12.5Hz average frame rates confirm that FlexiCodec excels over baseline systems in semantic information preservation and delivers a high audio reconstruction quality. We also validate the effectiveness of FlexiCodec in language model-based TTS. Demos are available at: https://flexicodec.github.io. Code is available at: https://github.com/amphionteam/flexicodec.

Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, Xiaofei Wang, Heng Lu, Manthan Thakker, Jinyu Li, Sheng Zhao, Zhizheng Wu• 2025

Related benchmarks

Task	Dataset	Result
Speech Reconstruction	Librispeech (test-clean)	UT MOS4.12	64
Text-to-Speech	Seed-TTS (eval)	WER2.63	39
Speech Reconstruction	Seed-ZH	PESQ1.88	29
Audio Understanding	X-Ares	ASV201593	21
Speech Understanding	X-Ares	CREMA-D Score58	19
Music Understanding	X-Ares	FMA Score0.00e+0	19
Speech Reconstruction	Seed EN	PESQ2.11	12
Neural Speech Coding	LibriSpeech clean (test)	STOI90	11
Neural Speech Coding	LibriSpeech (test-other)	STOI0.88	11

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord