Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging

About

Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our results suggest that sleep staging can be effectively addressed with structure-driven smoothing mechanisms rather than complex dependency modeling, enabling more efficient and edge-deployable healthcare systems for large-scale physiological monitoring.

Guisong Liu, Xin Gao, Martin Dresler, Jiansong Zhang, Pengfei Wei• 2026

Related benchmarks

TaskDatasetResultRank
Sleep Stage ClassificationSleep-EDFX
Accuracy (ACC)78.82
36
Sleep Stage ClassificationSHHS
F1 Weighted82.89
36
Sleep Stage ClassificationEDF-20
Accuracy79.96
8
Sleep Stage ClassificationEDF ST
Accuracy75.64
8
Showing 4 of 4 rows

Other info

Follow for update