Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning

About

Recent years have witnessed remarkable advances in spatiotemporal predictive learning, with methods incorporating auxiliary inputs, complex neural architectures, and sophisticated training strategies. While SimVP has introduced a simpler, CNN-based baseline for this task, it still relies on heavy Unet-like architectures for spatial and temporal modeling, which still suffers from high complexity and computational overhead. In this paper, we propose SimVPv2, a streamlined model that eliminates the need for Unet architectures and demonstrates that plain stacks of convolutional layers, enhanced with an efficient Gated Spatiotemporal Attention mechanism, can deliver state-of-the-art performance. SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency. On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance compared to SimVP, with fewer FLOPs, about half the training time, and 60% faster inference efficiency. Extensive experiments across eight diverse datasets, including real-world tasks such as traffic forecasting and climate prediction, further demonstrate that SimVPv2 offers a powerful yet straightforward solution, achieving robust generalization across various spatiotemporal learning scenarios. We believe the proposed SimVPv2 can serve as a solid baseline to benefit the spatiotemporal predictive learning community.

Cheng Tan, Zhangyang Gao, Siyuan Li, Stan Z. Li• 2022

Related benchmarks

TaskDatasetResultRank
Video PredictionKTH 10 -> 20 steps (test)
PSNR34.24
88
Video PredictionKTH 10 -> 40 steps (test)
PSNR33.35
77
Video PredictionMoving-MNIST 10 → 10 (test)
MSE15.05
39
Video PredictionKTH
PSNR27.46
35
Precipitation forecastingSEVIR (test)--
34
Video PredictionCaltech Pedestrian 10 -> 1 (test)
SSIM0.949
31
Spatio-temporal forecastingTaxiBJ
MSE0.3246
30
Traffic ForecastingTaxiBJ (test)
MAE15.6
29
Spatiotemporal PredictionMoving FMNIST (test)
MSE25.86
25
Spatiotemporal PredictionHuman3.6M 256x256
MSE108.4
23
Showing 10 of 33 rows

Other info

Code

Follow for update