Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Distilling Time Series Foundation Models for Efficient Forecasting

About

Time Series foundation models (TSFMs) deliver strong forecasting performance through large-scale pretraining, but their large parameter sizes make deployment costly. While knowledge distillation offers a natural and effective approach for model compression, techniques developed for general machine learning tasks are not directly applicable to time series forecasting due to the unique characteristics. To address this, we present DistilTS, the first distillation framework specifically designed for TSFMs. DistilTS addresses two key challenges: (1) task difficulty discrepancy, specific to forecasting, where uniform weighting makes optimization dominated by easier short-term horizons, while long-term horizons receive weaker supervision; and (2) architecture discrepancy, a general challenge in distillation, for which we design an alignment mechanism in the time series forecasting. To overcome these issues, DistilTS introduces horizon-weighted objectives to balance learning across horizons, and a temporal alignment strategy that reduces architectural mismatch, enabling compact models. Experiments on multiple benchmarks demonstrate that DistilTS achieves forecasting performance comparable to full-sized TSFMs, while reducing parameters by up to 1/150 and accelerating inference by up to 6000x. Code is available at: https://github.com/itsnotacie/DistilTS-ICASSP2026.

Yuqi Li, Kuiye Ding, Chuanguang Yang, Szu-Yu Chen, Yingli Tian• 2026

Related benchmarks

TaskDatasetResultRank
Long-term time-series forecastingWeather
MSE0.152
348
Long-term forecastingETTm1
MSE0.297
184
Long-term forecastingETTh1
MSE0.362
179
Long-term forecastingETTm2
MSE0.164
174
Long-term forecastingETTh2
MSE0.272
163
Showing 5 of 5 rows

Other info

Follow for update