Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Invariant Molecular Representation in Latent Discrete Space

About

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.

Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen• 2023

Related benchmarks

TaskDatasetResultRank
Graph ClassificationDrugOOD EC50 (OOD test)
ROC AUC77.48
52
Graph ClassificationMolHiv GOOD (size)
ROC-AUC62.2
28
Graph ClassificationMolHiv GOOD (scaffold)
ROC AUC69.06
28
Graph ClassificationGOOD-SST2 length
Accuracy79.41
28
Graph ClassificationHIV GraphOOD (test)
ROC-AUC72.93
26
RegressionGOOD-ZINC size split, covariate shift
Mean Absolute Error0.1029
24
Graph ClassificationDrugOOD IC50 (test)
ROC AUC72.11
24
Binary ClassificationGOOD-HIV scaffold split, covariate shift
ROC-AUC72.93
15
Binary ClassificationGOOD-HIV scaffold split concept shift
ROC AUC0.7432
15
Binary ClassificationGOOD-HIV size split, covariate shift
ROC-AUC0.6286
15
Showing 10 of 33 rows

Other info

Code

Follow for update