Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Automated Molecular Concept Generation and Labeling with Large Language Models

About

Artificial intelligence (AI) is transforming scientific research, with explainable AI methods like concept-based models (CMs) showing promise for new discoveries. However, in molecular science, CMs are less common than black-box models like Graph Neural Networks (GNNs), due to their need for predefined concepts and manual labeling. This paper introduces the Automated Molecular Concept (AutoMolCo) framework, which leverages Large Language Models (LLMs) to automatically generate and label predictive molecular concepts. Through iterative concept refinement, AutoMolCo enables simple linear models to outperform GNNs and LLM in-context learning on several benchmarks. The framework operates without human knowledge input, overcoming limitations of existing CMs while maintaining explainability and allowing easy intervention. Experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets demonstrate that AutoMolCo-induced explainable CMs are beneficial for molecular science research.

Zimin Zhang, Qianli Wu, Botao Xia, Fang Sun, Ziniu Hu, Yizhou Sun, Shichang Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Molecular Property ClassificationMoleculeNet BBBP
ROC AUC66.37
56
Molecular property predictionBACE
ROC-AUC73.9
55
RegressionMoleculeNet Lipophilicity
RMSE1.0591
21
Molecular property predictionMoleculeNet ESOL
RMSE0.8538
15
Molecular property predictionHIV MoleculeNet
ROC-AUC67.5
14
Showing 5 of 5 rows

Other info

Follow for update