Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

About

Diffusion models have recently shown promising results for video super-resolution (VSR). However, directly adapting generative diffusion models to VSR can result in redundancy, since low-quality videos already preserve substantial content information. Such redundancy leads to increased computational overhead and learning burden, as the model performs superfluous operations and must learn to filter out irrelevant information. To address this problem, we propose OASIS, an efficient $\textbf{o}$ne-step diffusion model with $\textbf{a}$ttention $\textbf{s}$pecialization for real-world v$\textbf{i}$deo $\textbf{s}$uper-resolution. OASIS incorporates an attention specialization routing that assigns attention heads to different patterns according to their intrinsic behaviors. This routing mitigates redundancy while effectively preserving pretrained knowledge, allowing diffusion models to better adapt to VSR and achieve stronger performance. Moreover, we propose a simple yet effective progressive training strategy, which starts with temporally consistent degradations and then shifts to inconsistent settings. This strategy facilitates learning under complex degradations. Extensive experiments demonstrate that OASIS achieves state-of-the-art performance on both synthetic and real-world datasets. OASIS also provides superior inference speed, offering a $\textbf{6.2$\times$}$ speedup over one-step diffusion baselines such as SeedVR2. The code will be available at \href{https://github.com/jp-guo/OASIS}{https://github.com/jp-guo/OASIS}.

Jinpei Guo, Yifei Ji, Shengwei Wang, Zheng Chen, Yufei Wang, Sizhuo Ma, Yong Guo, Baiang Li, Jusheng Zhang, Yulun Zhang, Jian Wang• 2025

Related benchmarks

TaskDatasetResultRank
Video Super-ResolutionUDM10
PSNR25.63
88
Video Super-ResolutionSPMCS
PSNR22.75
61
Video Super-ResolutionMVSR4x
PSNR22.66
49
Video Super-ResolutionRealVSR
PSNR21.14
28
Video Super-Resolutionvideo 33-frame 720x1280
Inference Time (s)4.97
13
Showing 5 of 5 rows

Other info

Follow for update