Towards Interpretable Visual Decoding with Attention to Brain Representations
About
Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, offering new ways to probe how the brain represents real-world scenes. However, many existing approaches first map brain signals into intermediate image or text feature spaces before guiding the generative process, which obscures the contributions of different brain areas to the final reconstruction output. In this work, we propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations, bypassing the need for intermediate feature spaces. Our method demonstrates competitive visual reconstruction quality on public fMRI datasets compared to prior work, while providing greater transparency into how brain signals drive visual reconstruction. To this end, we introduce an Image-Brain BI-directional interpretability framework (IBBI) that analyzes cross-attention patterns across diffusion denoising steps to reveal how different cortical areas influence the unfolding generative trajectory. Our work highlights the potential of end-to-end brain-to-image reconstruction and establishes a path for interpretable neural decoding.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| fMRI-to-image reconstruction | NSD 2 (test) | Inception Feature Similarity68.18 | 15 | |
| fMRI-to-image reconstruction | NSD | PixCorr12.4 | 9 | |
| Brain-to-Image Reconstruction | NSD-Imagery Mental Imagery Trials (test) | Pixel Correlation (PixCorr)0.037 | 6 |