Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

About

Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and struggles to generalize to new KGs. Recent knowledge graph foundation models (KGFMs) improve cross-KG transfer, but they mainly exploit structural patterns and ignore rich multi-modal signals. We address these gaps by proposing a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs. TOFU discretizes structural, visual, and textual information into modality-specific tokens. TOFU then employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR. Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines, delivering strong performance on unseen MMKGs.

Yichi Zhang, Zhuo Chen, Lingbing Guo, Wen Zhang, Huajun Chen• 2026

Related benchmarks

Task	Dataset	Result
Knowledge Graph Completion	MKG-W	MRR0.404	42
Knowledge Graph Completion	MKG-Y	MRR43.72	42
Knowledge Graph Completion	DB15K	MRR39.01	42
Knowledge Graph Completion	Overall DB15K, MKG-W, MKG-Y	MRR41.04	22
Multi-modal Knowledge Graph Reasoning	17 MMKGs Transductive	MRR46.87	9
Multi-modal Knowledge Graph Reasoning	17 MMKGs Inductive	MRR54.77	9
Multi-modal Knowledge Graph Reasoning	17 MMKGs Overall	MRR47.41	9
Multi-modal Knowledge Graph Reasoning	17 MMKGs Fully-Inductive	MRR43.44	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord