DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters
About
While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properties of time series data well. The prevalent architecture of Diffusion Transformers (DiT), which relies on simplistic conditioning controls and a single-stream Transformer backbone, tends to underutilize cross-variate dependencies in covariate-aware forecasting. Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS), a general-purpose architecture that frames endogenous and exogenous variates as distinct modalities. To better capture both inter-variate and intra-variate dependencies, we design a dual-stream Transformer block tailored for time-series data, comprising a Time Attention module for autoregressive modeling along the temporal dimension and a Variate Attention module for cross-variate modeling. Unlike the common approach for images, which flattens 2D token grids into 1D sequences, our design leverages the low-rank property inherent in multivariate dependencies, thereby reducing computational costs. Experiments show that DiTS achieves state-of-the-art performance across benchmarks, regardless of the presence of future exogenous variate observations, demonstrating unique generative forecasting strengths over traditional deterministic deep forecasting models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Probabilistic time series forecasting | ENTSO-e Load FEV leaderboard subset 1H | SQL0.468 | 16 | |
| Short-term forecasting with exogenous variables | EPF | NP MSE0.225 | 12 | |
| Deterministic Time Series Forecasting | EPF Nord Pool (test) | MSE0.271 | 8 | |
| Deterministic Time Series Forecasting | EPF PJM Interconnection (test) | MSE0.082 | 8 | |
| Deterministic Time Series Forecasting | EPF Belgian (test) | MSE0.376 | 8 | |
| Deterministic Time Series Forecasting | EPF French (test) | MSE0.36 | 8 | |
| Deterministic Time Series Forecasting | EPF German (test) | MSE0.279 | 8 | |
| Deterministic Time Series Forecasting | EPF Average All Subsets (test) | MSE0.274 | 8 | |
| Probabilistic time series forecasting | ENTSO-e Load FEV leaderboard 15T | SQL0.431 | 8 | |
| Probabilistic time series forecasting | ENTSO-e Load FEV leaderboard 30T | SQL0.423 | 8 |