Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Recency Biased Causal Attention for Time-series Forecasting

About

Recency bias is a useful inductive prior for sequential modeling: it emphasizes nearby observations and can still allow longer-range dependencies. Standard Transformer attention lacks this property, relying on all-to-all interactions that overlook the causal and often local structure of temporal data. We propose a simple mechanism to introduce recency bias by reweighting attention scores with a smooth heavy-tailed decay. This adjustment strengthens local temporal dependencies without sacrificing the flexibility to capture broader and data-specific correlations. We show that recency-biased attention consistently improves sequential modeling, aligning Transformer more closely with the read, ignore, and write operations of RNNs. Finally, we demonstrate that our approach achieves competitive and often superior performance on challenging time-series forecasting benchmarks.

Kareem Hegazy, Michael W. Mahoney, N. Benjamin Erichson• 2025

Related benchmarks

TaskDatasetResultRank
Time Series ForecastingWeather
MSE0.147
497
Time Series ForecastingETTm2
MSE0.163
300
Time Series ForecastingElectricity
MSE0.129
40
Showing 3 of 3 rows

Other info

Follow for update