Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs

About

Neural Speech Codecs face a fundamental trade-off at low bitrates: preserving acoustic fidelity often compromises semantic richness. To address this, we introduce SACodec, a novel codec built upon an asymmetric dual-quantizer that employs our proposed Semantic Anchoring mechanism. This design strategically decouples the quantization of Semantic and Acoustic details. The semantic anchoring is achieved via a lightweight projector that aligns acoustic features with a frozen, large-scale mHuBERT codebook, injecting linguistic priors while guaranteeing full codebook utilization. Sequentially, for acoustic details, a residual activation module with SimVQ enables a single-layer quantizer (acoustic path) to faithfully recover fine-grained information. At just 1.5 kbps, SACodec establishes a new state of the art by excelling in both fidelity and semantics: subjective listening tests confirm that its reconstruction quality is perceptually highly comparable to ground-truth audio, while its tokens demonstrate substantially improved semantic richness in downstream tasks.

Zhongren Dong, Bin Wang, Jing Han, Haotian Guo, Xiaojun Mo, Yimin Cao, Zixing Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Speech ReconstructionLibriTTS clean (test)
PESQ2.6937
50
Speech ReconstructionLibriTTS (test-other)
UTMOS3.4786
44
Audio ReconstructionLJSpeech
UTMOS3.9912
26
Semantic Representation EvaluationARCH (test)
RAVDESS42.65
13
Semantic Representation ClassificationARCH Reconstruction Domain
RAVDESS Accuracy75.69
10
Showing 5 of 5 rows

Other info

Follow for update