DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment
About
Training time-series forecasting models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimizing the conditional negative log-likelihood, typically estimated by the mean squared error. However, this estimation proves biased when the label sequence exhibits autocorrelation. In this paper, we propose DistDF, which achieves alignment by minimizing a distributional discrepancy between the conditional distributions of forecast and label sequences. Since such conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. The proposed discrepancy is tractable, differentiable, and readily compatible with gradient-based optimization. Extensive experiments show that DistDF improves diverse forecasting models and achieves leading performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multivariate Forecasting | ETTh1 | MSE0.43 | 686 | |
| Multivariate Time-series Forecasting | ETTm1 | MSE0.378 | 466 | |
| Multivariate Time-series Forecasting | ETTm2 | MSE0.277 | 389 | |
| Multivariate Forecasting | ETTh2 | MSE0.367 | 350 | |
| Multivariate Time-series Forecasting | Weather | MSE0.248 | 340 | |
| Multivariate Time-series Forecasting | Traffic | MSE0.417 | 264 | |
| Long-term time-series forecasting | ETTh1 (test) | MSE0.43 | 264 | |
| Long-term time-series forecasting | Traffic (test) | MSE0.417 | 149 | |
| Long-term time-series forecasting | Weather (test) | MSE0.248 | 147 | |
| Long-term time-series forecasting | ETTm1 (test) | MSE0.378 | 138 |