Perception-Aware Video Semantic Communication
About
Ultra-high-resolution streaming and emerging immersive services are driving rapidly increasing wireless video traffic. However, perceptually pleasing video transmission over bandwidth-limited and latency-constrained wireless links remains challenging for conventional separated source-channel systems, which primarily target bit-level reliability and often suffer performance degradation under short-blocklength transmission. In addition, pixel-level distortion optimization does not necessarily align with human perception, while existing learned video codecs may incur high complexity and raise deployment issues. This paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission. PVSC eliminates explicit motion-vector transmission and exploits spatio-temporal feature coding to generate compact and channel-robust symbol streams. It also specifies side-information formatting, reference-buffer management, and lightweight rate control, enabling stable receiver-side reconstruction and bandwidth-adaptive inference with a single model. Extensive experiments demonstrate that PVSC achieves superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions. Compared with the engineered ``VTM + 5G LDPC'' baseline, PVSC saves up to about 75% and 87% bandwidth at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Compression | MCL-JCV | -- | 79 | |
| Video Compression | UVG | BD-CBR (LPIPS)-95.1 | 28 | |
| Video Transmission | HEVC Class A | BD-CBR (LPIPS)-74.7 | 25 | |
| Video Transmission | HEVC Class B | BD-CBR (LPIPS)-87.6 | 25 | |
| Video Transmission | HEVC Class C | BD-CBR LPIPS-76.6 | 25 | |
| Video Transmission | HEVC Class D | BD-CBR (LPIPS)-73.6 | 25 | |
| Video Transmission | HEVC Class E | BD-CBR LPIPS Score-94.9 | 25 | |
| Video Transmission | Average All Datasets | BD-CBR (LPIPS)-77.3 | 25 | |
| Perceptual Video Coding | HEVC Class A | BD-Rate (LPIPS)-92.5 | 3 | |
| Perceptual Video Coding | HEVC Class B | BD-CBR (LPIPS)-96.5 | 3 |