PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

About

This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to enhance optimization stability, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.

Zixiang Wan, Haoran Zhao, Guochang Zhang, Runqiang Han, Jianqiang Wei, Yuexian Zou• 2025

Related benchmarks

Task	Dataset	Result
Speech Reconstruction	LRAC Challenge Track 1 (Clean) 2025 (test)	MUSHRA Score80.69	6
Speech Reconstruction	LRAC Challenge Track 1 (Noisy) 2025 (test)	DMOS4.16	6
Speech Reconstruction	LRAC Challenge Track 1 (Multi-talkers) 2025 (test)	DMOS2.08	6
Speech Intelligibility Assessment	LRAC Challenge Track 1 (Clean) 2025 (test)	DRT Score85.57	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord