Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields
About
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames. Since the event cameras are bio-inspired sensors that only encode brightness changes with a micro-second temporal resolution, several works utilized the event camera to enhance the performance of VFI. However, existing methods estimate bidirectional inter-frame motion fields with only events or approximations, which can not consider the complex motion in real-world scenarios. In this paper, we propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation. In detail, our EIF-BiOFNet utilizes each valuable characteristic of the events and images for direct estimation of inter-frame motion fields without any approximation methods. Moreover, we develop an interactive attention-based frame synthesis network to efficiently leverage the complementary warping-based and synthesis-based features. Finally, we build a large-scale event-based VFI dataset, ERF-X170FPS, with a high frame rate, extreme motion, and dynamic textures to overcome the limitations of previous event-based VFI datasets. Extensive experimental results validate that our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets. Our project pages are available at: https://github.com/intelpro/CBMNet
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Frame Interpolation | BS-ERGB 3 skips | PSNR26.24 | 15 | |
| Video Frame Prediction | BS-ERGB 1 frame (test) | PSNR25.12 | 10 | |
| Video Frame Prediction | GoPro 15 frames | PSNR13.38 | 10 | |
| Video Frame Prediction | BS-ERGB 3 frames (test) | PSNR22.08 | 10 | |
| Video Frame Prediction | HS-ERGB 7 frames (test) | PSNR27.76 | 10 | |
| Video Frame Prediction | GoPro 7 frames | PSNR14.38 | 10 | |
| Video Frame Interpolation | HQF 3 skips | PSNR28.73 | 9 | |
| Video Frame Interpolation | Clear-Motion 15 skips | PSNR22.26 | 9 | |
| Video Frame Interpolation (4x) | Real-world | MSE0.0023 | 5 | |
| Video Frame Interpolation (4x) | Synthetic | MSE0.0174 | 5 |