Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

About

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.

Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar• 2022

Related benchmarks

TaskDatasetResultRank
molecule property predictionMoleculeNet (scaffold split)
BBBP70.75
85
RegressionFreeSolv
RMSE1.288
33
RegressionMoleculeNet LIPO
RMSE0.6944
19
Molecule-to-Text RetrievalPubChem324k (test)
Accuracy45.8
18
Text-to-Molecule RetrievalPubChem324k (test)
Accuracy44.3
18
Molecular property predictionFoundation Models for Property Prediction
Pre-train Data Size281
13
Molecular property predictionAMES 7,255 drugs
AUC83.6
12
Molecule-Text RetrievalPCdes
R@139.5
12
Molecular property predictionDILI 475 drugs
AUC91.2
12
Molecular property predictionCarcinogens 278 drugs
AUC83.87
12
Showing 10 of 31 rows

Other info

Follow for update