CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
About
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Retrieval | THINGS-EEG 200-way zero-shot retrieval (Intra-Subject) | Top-1 Accuracy (Sub1)27.2 | 31 | |
| Retrieval | THINGS-EEG (test) | Top-1 Acc35.6 | 18 | |
| Image Retrieval | THINGS-EEG (test) | Top-1 Accuracy (Subject 1)31.4 | 15 | |
| Visual Reconstruction | THINGS-EEG (all subjects) | Pixel Correlation (PixCorr)0.178 | 8 | |
| Brain-to-image retrieval | THINGS-EEG (Intra-subject split) | Subject 1 Performance (T-1)31.4 | 6 | |
| EEG-to-Image Retrieval | THINGS-EEG2 (in-subject) | Top-1 Accuracy (2-way)93.15 | 6 | |
| EEG-Driven Image Generation | THINGS-EEG2 (Subject-01) | PixCorr0.15 | 4 | |
| EEG-based visual reconstruction | THINGS-EEG (Subject-08) | Pixel Correlation0.175 | 4 | |
| Image Reconstruction | Things-EEG | Pixel Correlation (PixCorr)15 | 3 |