Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

About

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

Pengfei Liu, Yiming Ren, Jun Tao, Zhixiang Ren• 2023

Related benchmarks

TaskDatasetResultRank
Molecule CaptioningChEBI-20 (test)
BLEU-40.263
107
Molecular property predictionBACE (test)
ROC-AUC81.08
65
Molecular property predictionBBBP (test)
ROC-AUC0.739
64
Molecular property predictionSIDER (test)
ROC-AUC0.634
53
Molecular property predictionTox21 (test)
ROC-AUC0.759
53
Text-guided molecule generationChEBI-20 (test)
MACCS FTS Similarity73.8
48
Molecular property predictionToxCast (test)
ROC-AUC66.8
34
Molecule Description GenerationChEBI-20 (test)
BLEU-235.2
34
Molecular property predictionClinTox (test)
ROC-AUC88.3
33
Description-guided molecule designChEBI-20 2022 (test)
Exact Match Accuracy5.1
26
Showing 10 of 14 rows

Other info

Code

Follow for update