Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Are Self-Attentions Effective for Time Series Forecasting?

About

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically advanced the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift the focus from evaluating the overall Transformer architecture to specifically examining the effectiveness of self-attention for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models. The implementation of our model is available at: https://github.com/dongbeank/CATS.

Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim• 2024

Related benchmarks

TaskDatasetResultRank
Multivariate long-term forecastingETTh1
MSE0.428
394
Multivariate long-term series forecastingETTh2
MSE0.355
367
Multivariate long-term series forecastingWeather
MSE0.15
359
Multivariate long-term series forecastingETTm1
MSE0.354
305
Time Series ForecastingETTm1 (test)
MSE0.395
278
Multivariate long-term series forecastingWeather (test)
MSE0.161
270
Multivariate long-term forecastingElectricity
MSE0.166
236
Multivariate long-term series forecastingETTm2
MSE0.288
223
Multivariate long-term series forecastingTraffic (test)
MSE0.421
220
Multivariate long-term series forecastingElectricity (test)
MSE0.149
170
Showing 10 of 36 rows

Other info

Follow for update