Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Augmented Deep Contexts for Spatially Embedded Video Coding

About

Most Neural Video Codecs (NVCs) only employ temporal references to generate temporal-only contexts and latent prior. These temporal-only NVCs fail to handle large motions or emerging objects due to limited contexts and misaligned latent prior. To relieve the limitations, we propose a Spatially Embedded Video Codec (SEVC), in which the low-resolution video is compressed for spatial references. Firstly, our SEVC leverages both spatial and temporal references to generate augmented motion vectors and hybrid spatial-temporal contexts. Secondly, to address the misalignment issue in latent prior and enrich the prior information, we introduce a spatial-guided latent prior augmented by multiple temporal latent representations. At last, we design a joint spatial-temporal optimization to learn quality-adaptive bit allocation for spatial references, further boosting rate-distortion performance. Experimental results show that our SEVC effectively alleviates the limitations in handling large motions or emerging objects, and also reduces 11.9% more bitrate than the previous state-of-the-art NVC while providing an additional low-resolution bitstream. Our code and model are available at https://github.com/EsakaK/SEVC.

Yifan Bian, Chuanbo Tang, Li Li, Dong Liu• 2025

Related benchmarks

TaskDatasetResultRank
Video CompressionMCL-JCV
BD-Rate (PSNR)-24.5
79
Video CompressionHEVC Class D
BD-Rate-30
74
Video CompressionHEVC Class B
BD-Rate (%)-16.4
63
Video CompressionHEVC Class C
BD-Rate (%)-15.8
61
Video CompressionHEVC Class E
BD-Rate (%)-28.5
58
Video CompressionUVG
BD-Rate (PSNR)-30.2
55
Video CompressionXIPH
BD-rate (DISTS)87.7
9
Video CompressionUSTC-TD
BD-Rate (PSNR)-13.4
7
Video CompressionHEVC Class B
BD-Rate vs PSNR62.8
6
Video Compression1080p sequences (test)
Encoding Time (ms)775
5
Showing 10 of 18 rows

Other info

Code

Follow for update