Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fragment-based Pretraining and Finetuning on Molecular Graphs

About

Property prediction on molecular graphs is an important application of Graph Neural Networks. Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principal subgraph mining, we obtain a compact vocabulary of prevalent fragments from a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one on molecular graphs and the other on fragment graphs, which represents higher-order connectivity within molecules. By enforcing consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that the embeddings capture structural information at multiple resolutions. The structural information of fragment graphs is further exploited to extract auxiliary labels for graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%. Code is available at: https://github.com/lvkd84/GraphFP.

Kha-Dinh Luong, Ambuj Singh• 2023

Related benchmarks

TaskDatasetResultRank
Graph ClassificationNCI1
Accuracy53.77
658
Graph ClassificationNCI109
Accuracy58.14
267
Graph RegressionPeptides struct LRGB (test)
MAE0.3137
238
Graph ClassificationPeptides-func LRGB (test)
AP0.6267
196
Graph ClassificationHIV
ROC-AUC0.7571
155
Graph RegressionPeptides-struct
MAE0.3137
134
Graph property predictionBACE
ROC AUC80.28
111
Graph ClassificationPeptides func
AP62.67
110
Graph property predictionTox21
ROC-AUC0.7735
109
Graph property predictionClinTox
ROC-AUC76.8
102
Showing 10 of 35 rows

Other info

Code

Follow for update