Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

About

Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters. The code is available at https://github.com/romilbert/samformer.

Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko• 2024

Related benchmarks

TaskDatasetResultRank
Time Series ForecastingETTh1
MSE0.432
601
Time Series ForecastingETTh2
MSE0.344
438
Time Series ForecastingETTm2
MSE0.269
382
Long-term time-series forecastingETTh1
MAE0.402
351
Multivariate long-term forecastingETTh1
MSE0.41
344
Long-term time-series forecastingETTh2
MSE0.295
327
Multivariate long-term series forecastingETTh2
MSE0.344
319
Long-term time-series forecastingETTm2
MSE0.181
305
Long-term time-series forecastingETTm1
MSE0.329
295
Multivariate long-term series forecastingWeather
MSE0.26
288
Showing 10 of 34 rows

Other info

Code

Follow for update