Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
About
Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and struggles to generalize to new KGs. Recent knowledge graph foundation models (KGFMs) improve cross-KG transfer, but they mainly exploit structural patterns and ignore rich multi-modal signals. We address these gaps by proposing a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs. TOFU discretizes structural, visual, and textual information into modality-specific tokens. TOFU then employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR. Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines, delivering strong performance on unseen MMKGs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Knowledge Graph Completion | MKG-W | MRR0.404 | 22 | |
| Knowledge Graph Completion | MKG-Y | MRR43.72 | 22 | |
| Knowledge Graph Completion | Overall DB15K, MKG-W, MKG-Y | MRR41.04 | 22 | |
| Knowledge Graph Completion | DB15K | MRR39.01 | 22 | |
| Multi-modal Knowledge Graph Reasoning | 17 MMKGs Transductive | MRR46.87 | 9 | |
| Multi-modal Knowledge Graph Reasoning | 17 MMKGs Inductive | MRR54.77 | 9 | |
| Multi-modal Knowledge Graph Reasoning | 17 MMKGs Overall | MRR47.41 | 9 | |
| Multi-modal Knowledge Graph Reasoning | 17 MMKGs Fully-Inductive | MRR43.44 | 9 |