Transfer Learning from ImageNet for MEG-Based Decoding of Imagined Speech
About
Non-invasive decoding of imagined speech remains challenging due to weak, distributed signals and limited labeled data. Our paper introduces an image-based approach that transforms magnetoencephalography (MEG) signals into time-frequency representations compatible with pretrained vision models. MEG data from 21 participants performing imagined speech tasks were projected into three spatial scalogram mixtures via a learnable sensor-space convolution, producing compact image-like inputs for ImageNet-pretrained vision architectures. These models outperformed classical and non-pretrained models, achieving up to 90.4% balanced accuracy for imagery vs. silence, 81.0% vs. silent reading, and 60.6% for vowel decoding. Cross-subject evaluation confirmed that pretrained models capture shared neural representations, and temporal analyses localized discriminative information to imagery-locked intervals. These findings show that pretrained vision models applied to image-based MEG representations can effectively capture the structure of imagined speech in non-invasive neural signals.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3-class Vowel Decoding | MEG Imagined Speech Pre-cue window | Balanced Acc36.7 | 12 | |
| 3-class Vowel Decoding | MEG Imagined Speech Post-cue window | Balanced Accuracy60.5 | 12 | |
| 3-class Vowel Decoding | MEG Imagined Speech Full window | Balanced Accuracy60.6 | 12 | |
| ISP vs Silence decoding | ISP vs Silence (Pre-cue window) | Balanced Accuracy72.9 | 12 | |
| ISP vs Silence decoding | ISP vs Silence (Post-cue window) | Balanced Accuracy90.2 | 12 | |
| ISP vs Silence decoding | ISP vs Silence Full window | Balanced Accuracy90.4 | 12 | |
| ISP vs SR decoding | MEG ISP vs SR Pre-cue window | Balanced Accuracy61.9 | 12 | |
| ISP vs SR decoding | MEG ISP vs SR Post-cue window | Balanced Accuracy79.1 | 12 | |
| ISP vs SR decoding | MEG ISP vs SR Full window | Balanced Acc81 | 12 |