SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling

About

In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.

Shengshi Yao, Jincheng Dai, Xiaoqi Qin, Sixian Wang, Siye Wang, Kai Niu, Ping Zhang• 2025

Related benchmarks

Task	Dataset	Result
Speech Quality Assessment	LibriSpeech 30% packet loss (test)	OVRL3.98	16
Speech Quality Assessment	LibriSpeech 5% packet loss (test)	P.808 MOS3.87	8
Word Error Rate	LibriSpeech	WER (0% Loss)7.2	8
Speech Reconstruction	LibriSpeech 10% packet loss (test)	PESQ3.01	5
Speech Reconstruction	LibriSpeech 20% packet loss (test)	PESQ2.53	5
Speech Reconstruction	LibriSpeech 30% packet loss (test)	PESQ2.06	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord