Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation
About
Pretraining for electroencephalogram (EEG) foundation models has predominantly relied on self-supervised masked reconstruction, a paradigm largely adapted from and inspired by the success of vision and language foundation models. However, unlike images and text, EEG datasets are notoriously expensive to collect and characterized by low signal-to-noise ratio. These challenges introduce difficulties in scaling the EEG foundation models and capturing the underlying neural semantics through reconstruction. In this work, we ask the question: can we stand on the shoulders of well-established foundation models from well-represented modalities to bootstrap the pretraining of EEG foundation models? We first demonstrate that mainstream foundation models, such as those from vision and time series, transfer surprisingly well to EEG domain. To this end, we propose the Multi-Teacher Distillation Pretraining (MTDP) framework for pretraining EEG foundation models via a two-stage multi-teacher distillation. In the first stage, we introduce a learnable gating network to fuse representations from diverse teachers (e.g., DINOv3 and Chronos) via a masked latent denoising objective. In the second stage, we distill the fused representation into an EEG foundation model. Extensive evaluations across 9 downstream tasks and 12 datasets demonstrate that our MTDP-based EEG foundation model outperforms its self-supervised counterparts while requiring only 25% of the pretraining data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Binary classification of normal versus abnormal EEG signals | TUAB | Balanced Accuracy81.02 | 49 | |
| EEG Classification | CHB-MIT | B-ACC80.13 | 30 | |
| Motor Imagery Classification | PhysioNet-MI | Balanced Accuracy64.57 | 27 | |
| Motor Imagery Classification | SHU-MI | Balanced Accuracy63.78 | 22 | |
| EEG Classification | BCIC 3 2020 | Balanced Accuracy62.53 | 20 | |
| Sleep Staging | ISRUC (test) | Accuracy79.41 | 14 | |
| EEG Classification | FACED | Binary Accuracy56.95 | 13 | |
| EEG Classification | Mumtaz 2016 | Balanced Accuracy95.85 | 13 | |
| EEG Classification | MentalArithmetic | Balanced Accuracy77.43 | 13 | |
| Motor Imagery Classification | BCIC 2a IV | Balanced Accuracy59.81 | 13 |