Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RealViformer: Investigating Attention for Real-World Video Super-Resolution

About

In real-world video super-resolution (VSR), videos suffer from in-the-wild degradations and artifacts. VSR methods, especially recurrent ones, tend to propagate artifacts over time in the real-world setting and are more vulnerable than image super-resolution. This paper investigates the influence of artifacts on commonly used covariance-based attention mechanisms in VSR. Comparing the widely-used spatial attention, which computes covariance over space, versus the channel attention, we observe that the latter is less sensitive to artifacts. However, channel attention leads to feature redundancy, as evidenced by the higher covariance among output channels. As such, we explore simple techniques such as the squeeze-excite mechanism and covariance-based rescaling to counter the effects of high channel covariance. Based on our findings, we propose RealViformer. This channel-attention-based real-world VSR framework surpasses state-of-the-art on two real-world VSR datasets with fewer parameters and faster runtimes. The source code is available at https://github.com/Yuehan717/RealViformer.

Yuehan Zhang, Angela Yao• 2024

Related benchmarks

TaskDatasetResultRank
Video Super-ResolutionVid4 (test)
PSNR21.963
173
Video Super-ResolutionUDM10 (test)
PSNR26.7
51
Video Super-ResolutionSPMCS (test)
Avg. PSNR24.19
36
Video RestorationREDS30
PSNR25.86
17
Video RestorationREDS30 Spatial Downsampling
PSNR26.03
10
Video RestorationYouHQ40 Spatio-Temporal Downsampling
PSNR25.51
10
Video RestorationUDM10 (test)
PSNR29.561
10
Video RestorationREDS30 (test)
PSNR26.146
10
Video RestorationYouHQ40 Spatio-Temporal Light
PSNR21.65
10
Video RestorationYouHQ40 Spatio-Temporal Strong
PSNR21.75
10
Showing 10 of 24 rows

Other info

Follow for update