Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Video Frame Interpolation Transformer

About

Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field. To address these issues, we propose a Transformer-based video interpolation framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video interpolation and extend it to the spatial-temporal domain. Furthermore, we propose a space-time separation strategy to save memory usage, which also improves performance. In addition, we develop a multi-scale frame synthesis scheme to fully realize the potential of Transformers. Extensive experiments demonstrate the proposed model performs favorably against the state-of-the-art methods both quantitatively and qualitatively on a variety of benchmark datasets.

Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang• 2021

Related benchmarks

TaskDatasetResultRank
Video Frame InterpolationUCF101
PSNR33.44
117
Video Frame InterpolationDAVIS
PSNR28.09
33
Video Frame InterpolationVimeo-90K septuplet
PSNR36.96
20
Video Frame InterpolationVimeo-90k
PSNR36.963
18
Video Frame InterpolationUCF101
PSNR33.837
12
Video Frame InterpolationGDM
PSNR30.217
12
Video InterpolationVimeo-90K septuplet (test)
Run-time0.08
5
Showing 7 of 7 rows

Other info

Code

Follow for update