Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

About

Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.

Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua• 2023

Related benchmarks

TaskDatasetResultRank
Graph ClassificationNCI1
Accuracy56.93
501
Graph ClassificationNCI109
Accuracy60.48
223
Molecular property predictionMoleculeNet BBBP (scaffold)
ROC AUC72.8
140
Molecular property predictionMoleculeNet SIDER (scaffold)
ROC-AUC0.606
120
Molecular property predictionMoleculeNet BACE (scaffold)
ROC-AUC81.5
110
Graph property predictionTox21
ROC-AUC0.7623
109
Graph ClassificationHIV
ROC-AUC0.7813
104
Graph property predictionClinTox
ROC-AUC74.11
102
Graph property predictionBACE
ROC AUC79.75
101
Graph property predictionToxCast
ROC-AUC0.6583
95
Showing 10 of 31 rows

Other info

Follow for update