V1T: large-scale mouse V1 response prediction using a Vision Transformer
About
Accurate predictive models of the visual cortex neural response to natural visual stimuli remain a challenge in computational neuroscience. In this work, we introduce V1T, a novel Vision Transformer based architecture that learns a shared visual and behavioral representation across animals. We evaluate our model on two large datasets recorded from mouse primary visual cortex and outperform previous convolution-based models by more than 12.7% in prediction performance. Moreover, we show that the self-attention weights learned by the Transformer correlate with the population receptive fields. Our model thus sets a new benchmark for neural response prediction and can be used jointly with behavioral and neural recordings to reveal meaningful characteristic features of the visual cortex.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Neural response modeling | Dataset F | Rho Trial0.3761 | 7 | |
| Neural response prediction | Dataset F Core K | ρtrial0.3713 | 4 | |
| Neural response prediction | Dataset F Core H | Rho Trial0.3628 | 4 | |
| Neural response modeling | Dataset S Core A Sensorium (test) | Rho Trial0.3787 | 4 | |
| Neural response modeling | Dataset S Core B Sensorium (test) | Trial Correlation (ρ)0.4522 | 4 | |
| Neural response modeling | Dataset S Core C Sensorium (test) | Rho Trial0.4124 | 4 | |
| Neural response modeling | Dataset S Core D Sensorium (test) | Rho (Trial)0.4145 | 4 | |
| Neural response modeling | Dataset S Core E Sensorium (test) | ρtrial0.3833 | 4 | |
| Neural response prediction | Dataset F Core F | Rho Trial0.3189 | 4 | |
| Neural response prediction | Dataset F Core G | Trial Correlation (ρ)0.3815 | 4 |