Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

About

Time series foundation models have demonstrated impressive performance as zero-shot forecasters. However, achieving effectively unified training on time series remains an open challenge. Existing approaches introduce some level of model specialization to account for the highly heterogeneous nature of time series data. For instance, Moirai pursues unified training by employing multiple input/output projection layers, each tailored to handle time series at a specific frequency. Similarly, TimesFM maintains a frequency embedding dictionary for this purpose. We identify two major drawbacks to this human-imposed frequency-level model specialization: (1) Frequency is not a reliable indicator of the underlying patterns in time series. For example, time series with different frequencies can display similar patterns, while those with the same frequency may exhibit varied patterns. (2) Non-stationarity is an inherent property of real-world time series, leading to varied distributions even within a short context window of a single time series. Frequency-level specialization is too coarse-grained to capture this level of diversity. To address these limitations, this paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With these designs, Moirai-MoE reduces reliance on human-defined heuristics and enables automatic token-level specialization. Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios. Furthermore, this study conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models and provides valuable insights for future research.

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, Doyen Sahoo• 2024

Related benchmarks

TaskDatasetResultRank
Context-guided time series forecastingPTF
MAE0.5179
45
ForecastingGIFT-Eval All
Relative MAPE0.86
13
ForecastingGIFT-Eval Multivariate
Relative MAPE0.93
13
ForecastingGIFT-Eval M Horizon
Relative MAPE0.96
13
ForecastingGIFT-Eval S Horizon
Relative MAPE0.77
13
ForecastingGIFT-Eval Univariate
Relative MAPE0.81
13
ForecastingElectricity 480
Relative MAPE0.79
13
ForecastingM4 hourly 48
Relative MAPE0.68
13
ForecastingElectricity 720
Relative MAPE0.86
13
ForecastingETTh1 48
Relative MAPE0.87
13
Showing 10 of 14 rows

Other info

Follow for update