Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

About

We propose AUV, a unified neural audio codec with a single codebook, which enables a favourable reconstruction of speech and further extends to general audio, including vocal, music, and sound. AUV is capable of tackling any 16 kHz mixed-domain audio segment at bit rates around 700 bps. To accomplish this, we guide the matryoshka codebook with nested domain-specific partitions, assigned with corresponding teacher models to perform distillation, all in a single-stage training. A conformer-style encoder-decoder architecture with STFT features as audio representation is employed, yielding better audio quality. Comprehensive evaluations demonstrate that AUV exhibits comparable audio reconstruction ability to state-of-the-art domain-specific single-layer quantizer codecs, showcasing the potential of audio universal vector quantization with a single codebook. The pre-trained model and demo samples are available at https://swivid.github.io/AUV/.

Yushen Chen, Kai Hu, Long Zhou, Shulin Feng, Xusheng Yang, Hangting Chen, Xie Chen• 2025

Related benchmarks

TaskDatasetResultRank
Audio ReconstructionAudioSet (eval)
Mel Distance1.26
63
Audio ReconstructionLibriSpeech clean (test)
STOI0.91
17
Showing 2 of 2 rows

Other info

Follow for update