Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

About

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.

Jingxi Chen, Brandon Y. Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christopher A. Metzler, Yiannis Aloimonos• 2024

Related benchmarks

TaskDatasetResultRank
Video Frame InterpolationBS-ERGB 3 skips
PSNR27.74
15
Video Frame PredictionGoPro 7 frames
PSNR19.02
10
Video Frame PredictionGoPro 15 frames
PSNR18.56
10
Video Frame PredictionBS-ERGB 1 frame (test)
PSNR21.22
10
Video Frame PredictionBS-ERGB 3 frames (test)
PSNR18.81
10
Video Frame PredictionHS-ERGB 7 frames (test)
PSNR20.12
10
Video Frame InterpolationHQF 3 skips
PSNR29.04
9
Video Frame InterpolationClear-Motion 15 skips
PSNR22.94
9
Video Frame Interpolation (11x)Real-world
MSE0.0057
4
Video Frame Interpolation (11x)Synthetic
MSE0.0503
4
Showing 10 of 11 rows

Other info

Code

Follow for update