Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts

About

Real-world multivariate time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes, which makes long-horizon forecasting challenging. Although sparse Mixture-of-Experts (MoE) approaches improve scalability and specialization, they typically rely on homogeneous MLP experts that poorly capture the diverse temporal dynamics of time series data. We address these limitations with MoHETS, an encoder-only Transformer that integrates sparse Mixture-of-Heterogeneous-Experts (MoHE) layers. MoHE routes temporal patches to a small subset of expert networks, combining a shared depthwise-convolution expert for sequence-level continuity with routed Fourier-based experts for patch-level periodic structures. MoHETS further improves robustness to non-stationary dynamics by incorporating exogenous information via cross-attention over covariate patch embeddings. Finally, we replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder, improving parameter efficiency, reducing training instability, and allowing a single model to generalize across arbitrary forecast horizons. We validate across seven multivariate benchmarks and multiple horizons, with MoHETS consistently achieving state-of-the-art performance, reducing the average MSE by $12\%$ compared to strong recent baselines, demonstrating effective heterogeneous specialization for long-term forecasting.

Evandro S. Ortigossa, Guy Lutsker, Eran Segal• 2026

Related benchmarks

TaskDatasetResultRank
Multivariate ForecastingETTh1
MSE0.383
686
Multivariate Time-series ForecastingETTm1
MSE0.333
466
Multivariate Time-series ForecastingETTm2
MSE0.256
389
Multivariate ForecastingETTh2
MSE0.348
350
Multivariate Time-series ForecastingWeather
MSE0.216
340
Multivariate Time-series ForecastingTraffic
MSE0.388
264
Multivariate long-term forecastingETTm1 (test)
MSE0.333
138
Multivariate long-term forecastingETTh2 (test)
MSE0.278
124
Multivariate Time-series ForecastingECL
MSE0.158
66
Multivariate long-term forecastingETTh1 T=720 (test)
MSE0.35
51
Showing 10 of 22 rows

Other info

Follow for update