CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
About
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Retrieval | THINGS-EEG 200-way zero-shot retrieval (Intra-Subject) | Top-5 Accuracy73.5 | 125 | |
| Retrieval | THINGS-EEG (test) | Top-1 Acc35.6 | 18 | |
| Image Retrieval | THINGS-EEG (test) | Top-1 Accuracy (Subject 1)31.4 | 15 | |
| Brain-to-image retrieval | THINGS-EEG (Intra-subject split) | Subject 1 Performance (T-1)31.4 | 14 | |
| EEG Visual Decoding | THINGS-EEG (test) | SSIM0.321 | 13 | |
| Classification | THINGS-EEG Subject-dependent 200-way zero-shot (test) | Top-1 Accuracy (Sub01)31.4 | 10 | |
| Zero-shot brain-to-image retrieval | THINGS-EEG Subject-dependent split | Top-1 Accuracy33.3 | 9 | |
| Zero-shot brain-to-image retrieval | THINGS-EEG Subject-independent | Top-1 Accuracy13 | 8 | |
| Visual Reconstruction | THINGS-EEG (all subjects) | Pixel Correlation (PixCorr)0.178 | 8 | |
| EEG-to-Image Retrieval | THINGS-EEG2 (in-subject) | Top-1 Accuracy (2-way)93.15 | 6 |