Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models
About
Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, they primarily rely on textual descriptions to align the graphs, limiting their application to text-attributed graphs. Moreover, different source domains may conflict or interfere with each other, and their relevance to the target domain can vary significantly. To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning. First, we propose a set of domain tokens to to align features across source domains for synergistic pre-training. Second, we propose a dual prompts, consisting of a unifying prompt and a mixing prompt, to further adapt the target domain with unified multi-domain knowledge and a tailored mixture of domain-specific knowledge. Finally, we conduct extensive experiments involving six public datasets to evaluate and analyze MDGPT, which outperforms prior art by up to 37.9%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Cora | Accuracy39.54 | 885 | |
| Node Classification | Pubmed | Accuracy58.7 | 307 | |
| Node Classification | Citeseer | Accuracy55.9 | 275 | |
| Node Classification | wikiCS | Accuracy54.1 | 198 | |
| Node Classification | OGBN-Products | Accuracy56.6 | 62 | |
| Node Classification | Cora | Accuracy62.7 | 38 | |
| Graph Classification | Pubmed | Accuracy67.6 | 31 | |
| Graph Classification | Citeseer | Accuracy59.3 | 29 | |
| Graph Classification | OGBN-Products | Accuracy60.5 | 26 | |
| Graph Classification | Wiki CS | Accuracy48.9 | 26 |