Deep Models, Shallow Alignment: Uncovering the Granularity Mismatch in Neural Decoding
About
Neural visual decoding is a central problem in brain computer interface research, aiming to reconstruct human visual perception and to elucidate the structure of neural representations. However, existing approaches overlook a fundamental granularity mismatch between human and machine vision, where deep vision models emphasize semantic invariance by suppressing local texture information, whereas neural signals preserve an intricate mixture of low-level visual attributes and high-level semantic content. To address this mismatch, we propose Shallow Alignment, a novel contrastive learning strategy that aligns neural signals with intermediate representations of visual encoders rather than their final outputs, thereby striking a better balance between low-level texture details and high-level semantic features. Extensive experiments across multiple benchmarks demonstrate that Shallow Alignment significantly outperforms standard final-layer alignment, with performance gains ranging from 22% to 58% across diverse vision backbones. Notably, our approach effectively unlocks the scaling law in neural visual decoding, enabling decoding performance to scale predictably with the capacity of pre-trained vision backbones. We further conduct systematic empirical analyses to shed light on the mechanisms underlying the observed performance gains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Retrieval | THINGS-EEG 200-way zero-shot retrieval (Intra-Subject) | Top-5 Accuracy97.7 | 125 | |
| Retrieval | THINGS-EEG 200-way zero-shot retrieval (Inter-Subject) | -- | 88 | |
| 200-way retrieval | THINGS-MEG Intra-subject | Top-1 Accuracy48 | 33 | |
| Brain-to-image retrieval | THINGS-EEG Inter-subject | Subject 1 T-1 Retrieval Rate54.7 | 26 | |
| Brain-to-image retrieval | THINGS-MEG Inter-subject | Average Top-1 Score3.8 | 26 | |
| Brain-to-image retrieval | THINGS-MEG Intra-subject | Retrieval Accuracy Subject 146.3 | 8 | |
| 200-way retrieval | THINGS-MEG Inter-subject LOSO | Subject 1 Accuracy0.069 | 6 |