Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models

About

Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, they primarily rely on textual descriptions to align the graphs, limiting their application to text-attributed graphs. Moreover, different source domains may conflict or interfere with each other, and their relevance to the target domain can vary significantly. To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning. First, we propose a set of domain tokens to to align features across source domains for synergistic pre-training. Second, we propose a dual prompts, consisting of a unifying prompt and a mixing prompt, to further adapt the target domain with unified multi-domain knowledge and a tailored mixture of domain-specific knowledge. Finally, we conduct extensive experiments involving six public datasets to evaluate and analyze MDGPT, which outperforms prior art by up to 37.9%.

Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang• 2024

Related benchmarks

Task	Dataset	Result
Graph Classification	PROTEINS	Accuracy55.06	1252
Node Classification	Cora	Accuracy60.6	1215
Graph Classification	MUTAG	Accuracy57.36	1103
Node Classification	Cora (test)	Mean Accuracy60	951
Node Classification	Chameleon	Accuracy28.04	867
Node Classification	Pubmed	Accuracy61.07	865
Node Classification	Wisconsin	Accuracy50.4	864
Node Classification	Cornell	Accuracy54.19	851
Node Classification	Texas	--	801
Node Classification	Squirrel	Accuracy24.41	786

Showing 10 of 65 rows

Other info

Follow for update

@wizwand_team Discord