Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction
About
Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| fMRI-to-Video Reconstruction | cc 2017 | 2-way Accuracy85 | 5 | |
| fMRI-to-Video Reconstruction | CineBrain | 2-way Accuracy93.7 | 4 | |
| Video Reconstruction from fMRI | cc and CineBrain 2017 | Semantic Alignment63.77 | 4 | |
| Brain-to-video reconstruction and retrieval | cc OOD 2017 | Acc282.1 | 3 | |
| fMRI-to-Video Reconstruction | cc 2017 (test) | EPE1.628 | 3 | |
| fMRI-to-image Retrieval | cc 2017 (test) | Top-1 Retrieval Accuracy28.3 | 2 | |
| fMRI-to-Video Reconstruction | CineBrain (test) | EPE2.126 | 2 | |
| image-to-fMRI Retrieval | cc 2017 (test) | Top-1 Retrieval Accuracy26.2 | 2 | |
| Video Reconstruction from fMRI | BOLDMoments | Accuracy (k=2)79.1 | 2 |