Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters

About

While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properties of time series data well. The prevalent architecture of Diffusion Transformers (DiT), which relies on simplistic conditioning controls and a single-stream Transformer backbone, tends to underutilize cross-variate dependencies in covariate-aware forecasting. Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS), a general-purpose architecture that frames endogenous and exogenous variates as distinct modalities. To better capture both inter-variate and intra-variate dependencies, we design a dual-stream Transformer block tailored for time-series data, comprising a Time Attention module for autoregressive modeling along the temporal dimension and a Variate Attention module for cross-variate modeling. Unlike the common approach for images, which flattens 2D token grids into 1D sequences, our design leverages the low-rank property inherent in multivariate dependencies, thereby reducing computational costs. Experiments show that DiTS achieves state-of-the-art performance across benchmarks, regardless of the presence of future exogenous variate observations, demonstrating unique generative forecasting strengths over traditional deterministic deep forecasting models.

Haoran Zhang, Haixuan Liu, Yong Liu, Yunzhong Qiu, Yuxuan Wang, Jianmin Wang, Mingsheng Long• 2026

Related benchmarks

TaskDatasetResultRank
Probabilistic time series forecastingENTSO-e Load FEV leaderboard subset 1H
SQL0.468
16
Short-term forecasting with exogenous variablesEPF
NP MSE0.225
12
Deterministic Time Series ForecastingEPF Nord Pool (test)
MSE0.271
8
Deterministic Time Series ForecastingEPF PJM Interconnection (test)
MSE0.082
8
Deterministic Time Series ForecastingEPF Belgian (test)
MSE0.376
8
Deterministic Time Series ForecastingEPF French (test)
MSE0.36
8
Deterministic Time Series ForecastingEPF German (test)
MSE0.279
8
Deterministic Time Series ForecastingEPF Average All Subsets (test)
MSE0.274
8
Probabilistic time series forecastingENTSO-e Load FEV leaderboard 15T
SQL0.431
8
Probabilistic time series forecastingENTSO-e Load FEV leaderboard 30T
SQL0.423
8
Showing 10 of 15 rows

Other info

Follow for update