An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture

About

Robust multimodal systems must remain effective when some modalities are noisy, degraded, or unreliable. Existing multimodal fusion methods often learn modality selection jointly with representation learning, making it difficult to determine whether robustness comes from the selector itself or from full end-to-end co-adaptation. Motivated by Global Workspace Theory (GWT), we study this question using a lightweight top-down modality selector operating on top of a frozen multimodal global workspace. We evaluate our method on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0, under structured modality corruptions. The selector improves robustness while using far fewer trainable parameters than end-to-end attention baselines, and the learned selection strategy transfers better across downstream tasks, corruption regimes, and even to a previously unseen modality. Beyond explicit corruption settings, on the MM-IMDb 1.0 benchmark, we show that the same mechanism improves the global workspace over its no-attention counterpart and yields decent benchmark performance.

Roland Bertin-Johannet, Lara Scipio, Leopold Mayti\'e, Rufin VanRullen• 2026

Related benchmarks

Task	Dataset	Result	Rank
Multimodal genre classification	MM-IMDb 1.0 (test)	Macro F165.34		13

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord