Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

About

Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility.

Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, Ming Jin• 2024

Related benchmarks

Task	Dataset	Result
Time Series Forecasting	ETTh1	MSE0.2401	836
Time Series Forecasting	ETTh2	MSE0.252	796
Long-term time-series forecasting	ETTh1	MAE0.426	575
Time Series Forecasting	ETTm2	MSE0.157	536
Long-term time-series forecasting	Weather	MSE0.162	525
Time Series Forecasting	Weather	MSE0.265	497
Long-term forecasting	ETTm1	MSE0.286	422
Long-term forecasting	ETTh1	MSE0.345	409
Time Series Forecasting	ETTh1 (test)	MSE0.362	398
Anomaly Detection	SMD	F1 Score21.62	375

Showing 10 of 141 rows

...

Other info

Follow for update

@wizwand_team Discord