Transfer Learning from ImageNet for MEG-Based Decoding of Imagined Speech

About

Non-invasive decoding of imagined speech remains challenging due to weak, distributed signals and limited labeled data. Our paper introduces an image-based approach that transforms magnetoencephalography (MEG) signals into time-frequency representations compatible with pretrained vision models. MEG data from 21 participants performing imagined speech tasks were projected into three spatial scalogram mixtures via a learnable sensor-space convolution, producing compact image-like inputs for ImageNet-pretrained vision architectures. These models outperformed classical and non-pretrained models, achieving up to 90.4% balanced accuracy for imagery vs. silence, 81.0% vs. silent reading, and 60.6% for vowel decoding. Cross-subject evaluation confirmed that pretrained models capture shared neural representations, and temporal analyses localized discriminative information to imagery-locked intervals. These findings show that pretrained vision models applied to image-based MEG representations can effectively capture the structure of imagined speech in non-invasive neural signals.

Soufiane Jhilal, St\'ephanie Martin, Anne-Lise Giraud• 2026

Related benchmarks

Task	Dataset	Result
3-class Vowel Decoding	MEG Imagined Speech Pre-cue window	Balanced Acc36.7	12
3-class Vowel Decoding	MEG Imagined Speech Post-cue window	Balanced Accuracy60.5	12
3-class Vowel Decoding	MEG Imagined Speech Full window	Balanced Accuracy60.6	12
ISP vs Silence decoding	ISP vs Silence (Pre-cue window)	Balanced Accuracy72.9	12
ISP vs Silence decoding	ISP vs Silence (Post-cue window)	Balanced Accuracy90.2	12
ISP vs Silence decoding	ISP vs Silence Full window	Balanced Accuracy90.4	12
ISP vs SR decoding	MEG ISP vs SR Pre-cue window	Balanced Accuracy61.9	12
ISP vs SR decoding	MEG ISP vs SR Post-cue window	Balanced Accuracy79.1	12
ISP vs SR decoding	MEG ISP vs SR Full window	Balanced Acc81	12

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord