Beyond All-to-All: Causal-Aligned Transformer with Dynamic Structure Learning for Multivariate Time Series Forecasting
About
Most existing multivariate time series forecasting methods adopt an all-to-all paradigm that feeds all variable histories into a unified model to predict their future values without distinguishing their individual roles. However, this undifferentiated paradigm makes it difficult to identify variable-specific causal influences and often entangles causally relevant information with spurious correlations. To address this limitation, we propose an all-to-one forecasting paradigm that predicts each target variable separately. Specifically, we first construct a Structural Causal Model from observational data and then, for each target variable, we partition the historical sequence into four subsegments according to the inferred causal structure: endogenous, direct causal, collider causal, and spurious correlation. Furthermore, we propose the Causal Decomposition Transformer (CDT), which integrates a dynamic causal adapter to learn causal structures initialized by the inferred graph, enabling correction of imperfect causal discovery during training. Furthermore, motivated by causal theory, we apply a projection-based output constraint to mitigate collider induced bias and improve robustness. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the CDT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multivariate Forecasting | ETTh1 | MSE0.406 | 645 | |
| Multivariate Time-series Forecasting | ETTm1 | MSE0.365 | 433 | |
| Multivariate Forecasting | ETTh2 | MSE0.358 | 341 | |
| Multivariate Time-series Forecasting | ETTm2 | MSE0.268 | 334 | |
| Multivariate Time-series Forecasting | Weather | MSE0.239 | 276 | |
| Multivariate Time-series Forecasting | Traffic | MSE0.411 | 200 | |
| Multivariate Time-series Forecasting | Exchange | MAE0.395 | 165 | |
| Multivariate Time-series Forecasting | ECL | MSE0.165 | 49 | |
| Multivariate long-term forecasting | ETTm1 T=96 (test) | MSE0.307 | 39 | |
| Multivariate Time-series Forecasting | Traffic S=720 (test) | MSE0.441 | 14 |