SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

About

While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) that stages sleep from multi-channel polysomnography (PSG) waveform images and generates clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa of 0.767 on a held-out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Independent expert evaluation by two trained sleep technologists further validated the model's reasoning quality, with mean scores of 3.75-3.96 out of 5 across factual accuracy, evidence comprehensiveness, and logical coherence on both datasets. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.

Guifeng Deng, Pan Wang, Mengfan Niu, Jiquan Wang, Shuying Rao, Junyi Xie, Xi'ang Chen, Sha Zhao, Gang Pan, Wanjun Guo, Tao Li, Haiteng Jiang• 2026

Related benchmarks

Task	Dataset	Result	Rank
Sleep Staging	MASS-SS1 (test)	Accuracy0.835		15
Sleep Staging	ZUAMHCS external (test)	Accuracy81.2		15

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord