CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation

About

Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation & Aggregation framework for decoupled multimodal graph learning. Concretely, CAMPA introduces a two-stage alignment mechanism: (1) cross-modal aligned propagation, which injects cross-modal similarity priors into message passing to preserve semantic consistency without additional parameter overhead; (2) trajectory aligned aggregation, which leverages trajectory-level self-attention and cross-attention to capture and align long-range dependencies across modalities and hops. Extensive experiments on diverse benchmark datasets and tasks demonstrate that CAMPA consistently outperforms strong coupled and decoupled baselines while preserving the efficiency advantages of the decoupled paradigm.

Daohan Su, Hao Liu, Xunkai Li, Yinlin Zhu, Xiong Yongfu, Yi Liu, Hongchao Qin, Rong-Hua Li, Guoren Wang• 2026

Related benchmarks

Task	Dataset	Result
Node Classification	Movies	Accuracy61.04	139
Node Classification	Grocery	Accuracy84.98	139
Node Clustering	RedditS	NMI81.03	50
Node Classification	RedditS	Accuracy96.76	49
Graph-to-Image	SemArt	CLIP-S Score66.41	36
Link Prediction	Cloth	MRR60.33	26
Node Classification	Goodreads	Accuracy76.93	26
Modality Matching	Ele-fashion	Score99.92	18
Modality Matching	RedditS	Score99.37	18
Modality Matching	Bili_music	Score99.23	18

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord