Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios

About

In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which is then discretized by a residual vector quantizer (RVQ). Subsequently, the decoder decodes the MDCT spectrum from the quantized latent code and reconstructs audio via inverse MDCT. During the training phase, a novel multi-resolution MDCT-based discriminator (MR-MDCTD) is adopted to discriminate the natural or decoded MDCT spectrum for adversarial training. Experimental results confirm that, in scenarios with high sampling rates and low bitrates, the MDCTCodec exhibited high decoded audio quality, improved training and generation efficiency, and compact model size compared to baseline codecs. Specifically, the MDCTCodec achieved a ViSQOL score of 4.18 at a sampling rate of 48 kHz and a bitrate of 6 kbps on the public VCTK corpus.

Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Ye-Xin Lu, Zhen-Hua Ling• 2024

Related benchmarks

TaskDatasetResultRank
Speech CodingLibriTTS 16 kHz (test)
GFLOPs2.28
19
Speech CodingVCTK 48 kHz (test)
RTF (CPU)0.142
12
Neural Speech CodingLibriTTS 16 kHz (test)
STOI0.912
12
Speech Quality EvaluationVCTK 48 kHz (test)
STOI0.866
12
Speech ReconstructionLibriTTS 16 kHz (test)
ViSQOL3.45
7
Speech ReconstructionVCTK 48 kHz (test)
ViSQOL3.48
6
Showing 6 of 6 rows

Other info

Follow for update