Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting

About

Variate tokenization, which independently embeds each variate as separate tokens, has achieved remarkable improvements in multivariate time series forecasting. However, employing self-attention with variate tokens incurs a quadratic computational cost with respect to the number of variates, thus limiting its training efficiency for large-scale applications. To address this issue, we propose VarDrop, a simple yet efficient strategy that reduces the token usage by omitting redundant variate tokens during training. VarDrop adaptively excludes redundant tokens within a given batch, thereby reducing the number of tokens used for dot-product attention while preserving essential information. Specifically, we introduce k-dominant frequency hashing (k-DFH), which utilizes the ranked dominant frequencies in the frequency domain as a hash value to efficiently group variate tokens exhibiting similar periodic behaviors. Then, only representative tokens in each group are sampled through stratified sampling. By performing sparse attention with these selected tokens, the computational cost of scaled dot-product attention is significantly alleviated. Experiments conducted on public benchmark datasets demonstrate that VarDrop outperforms existing efficient baselines.

Junhyeok Kang, Yooju Shin, Jae-Gil Lee• 2025

Related benchmarks

TaskDatasetResultRank
Multivariate Time-series ForecastingWeather
MSE0.261
340
Multivariate Time-series ForecastingTraffic
MSE0.396
264
Multivariate long-term time series forecastingSolar Energy
MSE0.236
79
Multivariate Time-series ForecastingElectricity
MAE0.245
73
Time Series ForecastingElectricity (test)
Memory Footprint (GB)2.22
6
Multivariate Time-series ForecastingElectricity, Traffic, Weather, Solar-Energy Aggregate
Overall MSE0.277
6
Time Series ForecastingElectricity (test)
Training Time (ms/iteration)30.8
5
Time Series ForecastingTraffic (test)
Training Time (ms)72.7
5
Time Series ForecastingWeather (test)
Training Time (ms/iteration)12.6
5
Time Series ForecastingSolar-Energy (test)
Training Time (ms/iteration)15.9
5
Showing 10 of 10 rows

Other info

Follow for update