Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging

About

Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our results suggest that sleep staging can be effectively addressed with structure-driven smoothing mechanisms rather than complex dependency modeling, enabling more efficient and edge-deployable healthcare systems for large-scale physiological monitoring.

Guisong Liu, Xin Gao, Martin Dresler, Jiansong Zhang, Pengfei Wei• 2026

Related benchmarks

Task	Dataset	Result
Sleep Stage Classification	Sleep-EDFX	Accuracy (ACC)78.82	36
Sleep Stage Classification	SHHS	F1 Weighted82.89	36
Sleep Stage Classification	EDF-20	Accuracy79.96	8
Sleep Stage Classification	EDF ST	Accuracy75.64	8

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord