Pre-training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck
About
This study aims to build a pre-trained Graph Neural Network (GNN) model on molecules without human annotations or prior knowledge. Although various attempts have been proposed to overcome limitations in acquiring labeled molecules, the previous pre-training methods still rely on semantic subgraphs, i.e., functional groups. Only focusing on the functional groups could overlook the graph-level distinctions. The key challenge to build a pre-trained GNN on molecules is how to (1) generate well-distinguished graph-level representations and (2) automatically discover the functional groups without prior knowledge. To solve it, we propose a novel Subgraph-conditioned Graph Information Bottleneck, named S-CGIB, for pre-training GNNs to recognize core subgraphs (graph cores) and significant subgraphs. The main idea is that the graph cores contain compressed and sufficient information that could generate well-distinguished graph-level representations and reconstruct the input graph conditioned on significant subgraphs across molecules under the S-CGIB principle. To discover significant subgraphs without prior knowledge about functional groups, we propose generating a set of functional group candidates, i.e., ego networks, and using an attention-based interaction between the graph core and the candidates. Despite being identified from self-supervised learning, our learned subgraphs match the real-world functional groups. Extensive experiments on molecule datasets across various domains demonstrate the superiority of S-CGIB.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Graph Classification | NCI1 | Accuracy79.75 | 460 | |
| Graph Classification | NCI109 | Accuracy77.54 | 223 | |
| Graph Classification | HIV | ROC-AUC0.7833 | 104 | |
| Graph property prediction | Tox21 | ROC-AUC0.8094 | 101 | |
| Graph property prediction | ClinTox | ROC-AUC78.58 | 94 | |
| Graph property prediction | BACE | ROC AUC86.51 | 93 | |
| Graph property prediction | BBBP | ROC-AUC88.75 | 87 | |
| Graph property prediction | ToxCast | ROC-AUC0.7095 | 87 | |
| Graph property prediction | MUV | ROC-AUC0.7771 | 87 | |
| Graph property prediction | SIDER | ROC AUC64.03 | 87 |