Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

About

We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.

Xiaojuan Wang, Boyang Zhou, Brian Curless, Ira Kemelmacher-Shlizerman, Aleksander Holynski, Steven M. Seitz• 2024

Related benchmarks

TaskDatasetResultRank
Video Frame InterpolationMultiInterpBench
FID53.4
24
Video Frame InterpolationVidGen 1M (test)
FVD698
11
Temporal Super-ResolutionSloMo-44K (test)
FloLPIPS0.123
10
Video Frame InterpolationPexels 45 video-keyframe pairs
LPIPS0.1114
8
Video InbetweeningDAVIS
Alignment0.1179
8
Video Frame InterpolationDAVIS 100 video-keyframe pairs 2017
LPIPS0.2432
8
Video Frame InterpolationVFI 1024 x 576 (test)
PSNR21.05
8
Video GenerationEpic100
FVD325.5
6
First-Person Video GenerationEpic100
Reasonability4.13
6
Video GenerationVideo Generation 25-frame
PSNR17.418
6
Showing 10 of 24 rows

Other info

Follow for update