A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

About

We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam• 2022

Related benchmarks

Task	Dataset	Result
Time Series Forecasting	ETTh1	MSE0.37	836
Multivariate Forecasting	ETTh1	MSE0.322	830
Time Series Forecasting	ETTh2	MSE0.274	796
Multivariate Time-series Forecasting	ETTm1	MSE0.248	686
Long-term time-series forecasting	ETTh1	MAE0.248	575
Multivariate Time-series Forecasting	ETTm2	MSE0.133	539
Time Series Forecasting	ETTm2	MSE0.165	536
Long-term time-series forecasting	Weather	MSE0.045	525
Time Series Forecasting	Weather	MSE0.149	497
Multivariate long-term forecasting	ETTh1	MSE0.366	472

Showing 10 of 1552 rows

...

Other info

Follow for update

@wizwand_team Discord