BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
About
While prior methods in Continuous Spatial-Temporal Video Super-Resolution (C-STVSR) employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow networks for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve--and even degrades--performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency. Our code is available https://github.com/Eunjnnn/bfstvsr.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Super-Resolution | REDS (val) | PSNR34.72 | 89 | |
| Video Super-Resolution | UDM10 (test) | PSNR25.09 | 51 | |
| Space-Time Video Super-Resolution | Vid4 | PSNR25.85 | 41 | |
| Space-Time Video Super-Resolution | GoPro Average (test) | PSNR30.22 | 31 | |
| Spatiotemporal Video Super-Resolution | GoPro Center | PSNR31.17 | 23 | |
| Spatio-temporal Super-resolution | Adobe Center | PSNR30.83 | 8 | |
| Spatio-temporal Super-resolution | Adobe Average | PSNR30.12 | 7 | |
| Spatio-Temporal Video Super-Resolution | Vid4 | tOF0.323 | 6 | |
| Continuous Space-Time Video Super-Resolution | C-STVSR | Inference Time (s)1.9 | 4 |