Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Video Frame Interpolation with Transformer

About

Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years. Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Further, our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other. This design effectively enlarges the receptive field and aggregates multi-scale information. Extensive quantitative and qualitative experiments demonstrate that our method achieves new state-of-the-art results on various benchmarks.

Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia• 2022

Related benchmarks

TaskDatasetResultRank
Video Frame InterpolationVimeo90K (test)
PSNR36.5
131
Video Frame InterpolationUCF101
PSNR35.43
117
Video Frame InterpolationVimeo90K
PSNR36.5
62
Video Frame InterpolationSNU-FILM Easy
PSNR40.13
59
Video Frame InterpolationSNU-FILM Medium
PSNR36.09
59
Video Frame InterpolationSNU-FILM Extreme
PSNR25.43
59
Video Frame InterpolationSNU-FILM Hard
PSNR30.67
59
Video Frame InterpolationMiddlebury
Average IE Error1.82
42
Video Frame InterpolationUCF101 (test)
PSNR35.43
41
Video Frame InterpolationXiph 4K (test)
PSNR33.69
25
Showing 10 of 20 rows

Other info

Code

Follow for update