Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

About

In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR.

Claudio Rota, Marco Buzzelli, Joost van de Weijer• 2023

Related benchmarks

Task	Dataset	Result
Video Super-Resolution	Vid4 (test)	PSNR22.213	181
Video Super-Resolution	REDS4 (test)	PSNR (Avg)27.928	128
Video Restoration	REDS30	PSNR23.19	17
Video Super-Resolution	VideoLQ (test)	NRQM6.154	17
Video Super-Resolution	REDS4 synthetic (test)	LPIPS0.098	12
Video Super-Resolution	Vid4 synthetic (test)	LPIPS0.182	12
Video Restoration	VideoLQ	MUSIQ31.85	7
Video Super-Resolution	VideoLQ	NIQE3.982	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord