UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

About

Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot manner by modulating the reverse diffusion sampling process, thereby creating a unified event-to-frame reconstruction framework. Experimental results on real-world and synthetic datasets demonstrate that our method significantly outperforms previous approaches both quantitatively and qualitatively. We also refer the reviewers to the video demo contained in the supplementary material for video results. The code will be publicly available at https://github.com/CS-GangXu/UniE2F.

Gang Xu, Zhiyu Zhu, Junhui Hou• 2026

Related benchmarks

Task	Dataset	Result
Event-based video frame reconstruction	Real-world	MSE0.0612	10
Event-based video frame reconstruction	Synthetic	MSE0.0167	10
Video Frame Reconstruction	Real-world	FID184.3	10
Video Frame Reconstruction	Synthetic	FID57.1092	10
Video Frame Interpolation (4x)	Synthetic	MSE0.0063	5
Video Frame Interpolation (4x)	Real-world	MSE0.0041	5
Long-sequence generation	Synthetic dataset	MSE0.0271	4
Video Frame Interpolation (11x)	Synthetic	MSE0.0072	4
Video Frame Interpolation (11x)	Real-world	MSE0.0058	4
Video Frame Prediction	Synthetic	MSE0.0093	2

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord