Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

About

Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot manner by modulating the reverse diffusion sampling process, thereby creating a unified event-to-frame reconstruction framework. Experimental results on real-world and synthetic datasets demonstrate that our method significantly outperforms previous approaches both quantitatively and qualitatively. We also refer the reviewers to the video demo contained in the supplementary material for video results. The code will be publicly available at https://github.com/CS-GangXu/UniE2F.

Gang Xu, Zhiyu Zhu, Junhui Hou• 2026

Related benchmarks

TaskDatasetResultRank
Event-based video frame reconstructionReal-world
MSE0.0612
10
Event-based video frame reconstructionSynthetic
MSE0.0167
10
Video Frame ReconstructionReal-world
FID184.3
10
Video Frame ReconstructionSynthetic
FID57.1092
10
Video Frame Interpolation (4x)Synthetic
MSE0.0063
5
Video Frame Interpolation (4x)Real-world
MSE0.0041
5
Long-sequence generationSynthetic dataset
MSE0.0271
4
Video Frame Interpolation (11x)Synthetic
MSE0.0072
4
Video Frame Interpolation (11x)Real-world
MSE0.0058
4
Video Frame PredictionSynthetic
MSE0.0093
2
Showing 10 of 11 rows

Other info

Follow for update