Error-Resilient Semantic Communication for Speech Transmission over Packet-Loss Networks
About
Real-time speech communication over wireless networks remains challenging, as conventional channel protection mechanisms cannot effectively counter packet loss under stringent bandwidth and latency constraints. Semantic communication has emerged as a promising paradigm for enhancing the robustness of speech transmission by means of joint source-channel coding (JSCC). However, its cross-layer design hinders practical deployment due to the incompatibility with existing digital communication systems. In this case, the robustness of speech communication is consequently evaluated primarily by the error-resilience to packet loss over wireless networks. To address these challenges, we propose \emph{Glaris}, a generative latent-prior-based resilient speech semantic communication framework that performs resilient speech coding in the generative latent space. Generative latent priors enable high-quality packet loss concealment (PLC) at the receiver side, well-balancing semantic consistency and reconstruction fidelity. Additionally, an integrated error resilience mechanism is designed to mitigate the error propagation and improve the effectiveness of PLC. Compared with traditional packet-level forward error correction (FEC) strategies, our new method achieves enhanced robustness over dynamic wireless networks while reducing redundancy overhead significantly. Experimental results on the LibriSpeech dataset demonstrate that \emph{Glaris} consistently outperforms existing error-resilient codecs, achieving JSCC-level robustness while maintaining seamless compatibility with existing systems, and it also strikes a favorable balance between transmission efficiency and speech reconstruction quality.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Quality Assessment | LibriSpeech 30% packet loss (test) | OVRL4.09 | 16 | |
| Word Error Rate | LibriSpeech | WER (0% Loss)6.7 | 8 | |
| Speech Quality Assessment | LibriSpeech 5% packet loss (test) | P.808 MOS3.85 | 8 | |
| Neural Speech Coding | LibriSpeech (test) | RTF1.17 | 7 | |
| Speech Reconstruction | LibriSpeech 10% packet loss (test) | PESQ3.54 | 5 | |
| Speech Reconstruction | LibriSpeech 20% packet loss (test) | PESQ3.12 | 5 | |
| Speech Reconstruction | LibriSpeech 30% packet loss (test) | PESQ2.63 | 5 |