RealViformer: Investigating Attention for Real-World Video Super-Resolution

About

In real-world video super-resolution (VSR), videos suffer from in-the-wild degradations and artifacts. VSR methods, especially recurrent ones, tend to propagate artifacts over time in the real-world setting and are more vulnerable than image super-resolution. This paper investigates the influence of artifacts on commonly used covariance-based attention mechanisms in VSR. Comparing the widely-used spatial attention, which computes covariance over space, versus the channel attention, we observe that the latter is less sensitive to artifacts. However, channel attention leads to feature redundancy, as evidenced by the higher covariance among output channels. As such, we explore simple techniques such as the squeeze-excite mechanism and covariance-based rescaling to counter the effects of high channel covariance. Based on our findings, we propose RealViformer. This channel-attention-based real-world VSR framework surpasses state-of-the-art on two real-world VSR datasets with fewer parameters and faster runtimes. The source code is available at https://github.com/Yuehan717/RealViformer.

Yuehan Zhang, Angela Yao• 2024

Related benchmarks

Task	Dataset	Result
Video Super-Resolution	Vid4 (test)	PSNR21.963	206
Video Super-Resolution	UDM10	PSNR26.78	111
Video Super-Resolution	SPMCS	PSNR24.18	68
Video Super-Resolution	UDM10 (test)	PSNR26.7	51
Video Super-Resolution	MVSR4x	PSNR22.44	49
Video Super-Resolution	SPMCS (test)	Avg. PSNR24.19	45
Video Restoration	UDM10 (test)	PSNR29.561	19
Video Super-Resolution	YouHQ40	PSNR24.44	18
Video Restoration	REDS30	PSNR25.86	17
Video Super-Resolution	VideoLQ	MUSIQ52.18	17

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord