TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation

About

Event cameras capture per-pixel brightness changes with microsecond resolution, offering continuous motion information lost between RGB frames. However, existing event-based motion estimators depend on large-scale synthetic data that often suffers from a significant sim-to-real gap. We propose TETO (Tracking Events with Teacher Observation), a teacher-student framework that learns event motion estimation from only $\sim$25 minutes of unannotated real-world recordings through knowledge distillation from a pretrained RGB tracker. Our motion-aware data curation and query sampling strategy maximizes learning from limited data by disentangling object motion from dominant ego-motion. The resulting estimator jointly predicts point trajectories and dense optical flow, which we leverage as explicit motion priors to condition a pretrained video diffusion transformer for frame interpolation. We achieve state-of-the-art point tracking on EVIMO2 and optical flow on DSEC using orders of magnitude less training data, and demonstrate that accurate motion estimation translates directly to superior frame interpolation quality on BS-ERGB and HQ-EVFI.

Jini Yang, Eunbeen Hong, Soowon Son, Hyunkoo Lee, Sunghwan Hong, Sunok Kim, Seungryong Kim• 2026

Related benchmarks

Task	Dataset	Result
Video Frame Interpolation	BS-ERGB	LPIPS0.0684	17
Optical Flow	DSEC All	EPE1.39	12
Optical Flow	DSEC interlaken_00_b	EPE2.13	12
Optical Flow	DSEC interlaken_01_a	EPE1.51	12
Optical Flow	DSEC thun_01_a	EPE1.04	12
Optical Flow	DSEC thun_01_b	EPE1.12	12
Optical Flow	DSEC zurich_city_12_a	EPE1.06	12
Optical Flow	DSEC zurich_city_14_c	Endpoint Error (EPE)1.24	12
Optical Flow	DSEC zurich_city_15_a	Endpoint Error (EPE)1.37	12
Video Frame Interpolation	HQ-EVFI dynamic motion subset	FID17.39	8

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord