Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

About

We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.

Xiaojuan Wang, Boyang Zhou, Brian Curless, Ira Kemelmacher-Shlizerman, Aleksander Holynski, Steven M. Seitz• 2024

Related benchmarks

TaskDatasetResultRank
Video Frame InterpolationMultiInterpBench
FID53.4
24
Video Frame InterpolationVidGen 1M (test)
FVD698
11
Video Frame InterpolationPexels 45 video-keyframe pairs
LPIPS0.1114
8
Video InbetweeningDAVIS
Alignment0.1179
8
Video Frame InterpolationDAVIS 100 video-keyframe pairs 2017
LPIPS0.2432
8
Video Frame InterpolationVFI 1024 x 576 (test)
PSNR21.05
8
Video GenerationVideo Generation 25-frame
PSNR17.418
6
Video GenerationTGI-Bench 81-frame
PSNR15.59
6
Generative InbetweeningTGI-Bench 65-frame
X-CLIP Score0.2169
6
Generative InbetweeningTGI-Bench 81-frame
X-CLIP Score0.2082
6
Showing 10 of 17 rows

Other info

Follow for update