UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

About

The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms. The evolutionary trends from multi-layer residual vector quantizer to single-layer quantizer are beneficial for language-autoregressive decoding. However, the capability to handle multi-domain audio signals through a single codebook remains constrained by inter-domain distribution discrepancies. In this work, we introduce UniCodec, a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound. To achieve this, we propose a partitioned domain-adaptive codebook method and domain Mixture-of-Experts strategy to capture the distinct characteristics of each audio domain. Furthermore, to enrich the semantic density of the codec without auxiliary modules, we propose a self-supervised mask prediction modeling approach. Comprehensive objective and subjective evaluations demonstrate that UniCodec achieves excellent audio reconstruction performance across the three audio domains, outperforming existing unified neural codecs with a single codebook, and even surpasses state-of-the-art domain-specific codecs on both acoustic and semantic representation capabilities.

Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, ShiLiang Zhang, Haizhou Li• 2025

Related benchmarks

Task	Dataset	Result
Speech Reconstruction	LibriTTS clean (test)	PESQ3.0266	67
Acoustic Consistency	SALMon	Speaker Consistency49	66
Speech Reconstruction	Librispeech (test-clean)	UT MOS4	64
Audio Reconstruction	AudioSet (eval)	Mel Distance0.382	63
Speech Reconstruction	LibriTTS (test-other)	UTMOS3.58	57
Audio Reconstruction	MusicDB (test)	Mel Distance0.3959	28
Audio Reconstruction	LibriSpeech clean (test)	STOI0.92	25
Audio Reconstruction	AudioSet (test)	Mel Distance (16kHz)0.903	23
Audio Reconstruction	Codec-SUPERB tiny (Speech)	Mel1.337	14
Speech Quality Evaluation	Common Voice 17	Quality Score (NL)2.256	14

Showing 10 of 24 rows

Other info

Code

Follow for update

@wizwand_team Discord