Dataset-Driven Channel Masks in Transformers for Multivariate Time Series
About
Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts have primarily Capturing channel dependency (CD) is essential for modeling multivariate time series (TS), and attention-based methods have been widely employed for this purpose. Nonetheless, these methods primarily focus on modifying the architecture, often neglecting the importance of dataset-specific characteristics. In this work, we introduce the concept of partial channel dependence (PCD) to enhance CD modeling in Transformer-based models by leveraging dataset-specific information to refine the CD captured by the model. To achieve PCD, we propose channel masks (CMs), which are integrated into the attention matrices of Transformers via element-wise multiplication. CMs consist of two components: 1) a similarity matrix that captures relationships between the channels, and 2) dataset-specific and learnable domain parameters that refine the similarity matrix. We validate the effectiveness of PCD across diverse tasks and datasets with various backbones. Code is available at this repository: https://github.com/YonseiML/pcd.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Time Series Forecasting | ETTh1 | MSE0.405 | 836 | |
| Time Series Forecasting | ECL | MSE0.14 | 294 | |
| Time Series Forecasting | PeMS08 | MSE0.109 | 229 | |
| Time Series Forecasting | PeMS04 | MSE0.093 | 169 | |
| Time Series Forecasting | Exchange | MSE0.088 | 98 | |
| Time Series Forecasting | ETTh2 | MSE0.328 | 88 | |
| Multivariate Time-series Forecasting | solar | MAE0.585 | 74 | |
| Time Series Forecasting | ETTh1 | MSE0.492 | 63 | |
| Time Series Forecasting | Weather | MSE0.219 | 55 | |
| Time Series Forecasting | ECL | MSE0.149 | 24 |