Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts

About

Real-world multivariate time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes, which makes long-horizon forecasting challenging. Although sparse Mixture-of-Experts (MoE) approaches improve scalability and specialization, they typically rely on homogeneous MLP experts that poorly capture the diverse temporal dynamics of time series data. We address these limitations with MoHETS, an encoder-only Transformer that integrates sparse Mixture-of-Heterogeneous-Experts (MoHE) layers. MoHE routes temporal patches to a small subset of expert networks, combining a shared depthwise-convolution expert for sequence-level continuity with routed Fourier-based experts for patch-level periodic structures. MoHETS further improves robustness to non-stationary dynamics by incorporating exogenous information via cross-attention over covariate patch embeddings. Finally, we replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder, improving parameter efficiency, reducing training instability, and allowing a single model to generalize across arbitrary forecast horizons. We validate across seven multivariate benchmarks and multiple horizons, with MoHETS consistently achieving state-of-the-art performance, reducing the average MSE by $12\%$ compared to strong recent baselines, demonstrating effective heterogeneous specialization for long-term forecasting.

Evandro S. Ortigossa, Guy Lutsker, Eran Segal• 2026

Related benchmarks

TaskDatasetResultRank
Multivariate ForecastingETTh1
MSE0.383
645
Multivariate Time-series ForecastingETTm1
MSE0.333
433
Multivariate ForecastingETTh2
MSE0.348
341
Multivariate Time-series ForecastingETTm2
MSE0.256
334
Multivariate Time-series ForecastingWeather
MSE0.216
276
Multivariate Time-series ForecastingTraffic
MSE0.388
200
Multivariate long-term forecastingETTm1 (test)
MSE0.333
134
Multivariate long-term forecastingETTh2 (test)
MSE0.278
76
Multivariate long-term forecastingETTh1 T=720 (test)
MSE0.35
51
Multivariate Time-series ForecastingECL
MSE0.158
49
Showing 10 of 22 rows

Other info

Follow for update