Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization

About

Despite recent advancements, most computational methods for molecule optimization are constrained to single- or double-property optimization tasks and suffer from poor scalability and generalizability to novel optimization tasks. Meanwhile, Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks. To demonstrate LLMs' potential for molecule optimization, we introduce MuMOInstruct, the first high-quality instruction-tuning dataset specifically focused on complex multi-property molecule optimization tasks. Leveraging MuMOInstruct, we develop GeLLMOs, a series of instruction-tuned LLMs for molecule optimization. Extensive evaluations across 5 in-domain and 5 out-of-domain tasks demonstrate that GeLLMOs consistently outperform state-of-the-art baselines. GeLLMOs also exhibit outstanding zero-shot generalization to unseen tasks, significantly outperforming powerful closed-source LLMs. Such strong generalizability demonstrates the tremendous potential of GeLLMOs as foundational models for molecule optimization, thereby tackling novel optimization tasks without resource-intensive retraining. MuMOInstruct, models, and code are accessible through https://github.com/ninglab/GeLLMO.

Vishal Dey, Xiao Hu, Xia Ning• 2025

Related benchmarks

TaskDatasetResultRank
Single-Property Molecular Optimization (DRD2)ZINC 250k 200 lead molecules
Success Rate (SR)49
14
Single-Property Molecular Optimization (plogP)ZINC 250k 200 lead molecules
Success Rate (SR)57
14
Single-Property Molecular Optimization (QED)ZINC 250k 200 lead molecules
Success Rate (SR)61.5
14
Single-Property Molecular Optimization (SA)ZINC 250k 200 lead molecules
Success Rate (SR)14.5
14
Single-Property Molecular Optimization (JNK3)ZINC 250k 200 lead molecules
Success Rate (SR)8.5
14
Multi-Property Molecular Optimization (QED+plogP)ZINC 250K
Success Rate (SR)19.5
13
Multi-Property Molecular Optimization (plogP+DRD2)ZINC 250K
SR (%)16
13
Multi-Property Molecular Optimization (QED+SA)ZINC 250K
Success Rate (SR)22.5
13
Multi-Property Molecular Optimization (DRD2+SA)ZINC 250K
SR9
13
Multi-Property Molecular Optimization (DRD2+QED+plogP)ZINC 250K
Success Rate (SR)0.00e+0
13
Showing 10 of 11 rows

Other info

Follow for update