Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling

About

In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.

Shengshi Yao, Jincheng Dai, Xiaoqi Qin, Sixian Wang, Siye Wang, Kai Niu, Ping Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Speech Quality AssessmentLibriSpeech 30% packet loss (test)
OVRL3.98
16
Speech Quality AssessmentLibriSpeech 5% packet loss (test)
P.808 MOS3.87
8
Word Error RateLibriSpeech
WER (0% Loss)7.2
8
Speech ReconstructionLibriSpeech 10% packet loss (test)
PESQ3.01
5
Speech ReconstructionLibriSpeech 20% packet loss (test)
PESQ2.53
5
Speech ReconstructionLibriSpeech 30% packet loss (test)
PESQ2.06
5
Showing 6 of 6 rows

Other info

Follow for update