Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation

About

Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, especially the recent emergence of reasoning models, it is natural to explore how a general-purpose language model can be efficiently adapted for molecular science applications. In this work, we introduce BioMedGPT-Mol, a molecular language model designed to support molecular understanding and generation tasks. By curating and unifying existing public instruction datasets, we have assembled a large-scale, comprehensive, and high-quality training dataset. The model is then fine-tuned through a meticulously designed multi-task learning framework. On a consolidated benchmark derived from LlaSMol, TOMG-Bench, and MuMOInstruct, BioMedGPT-Mol achieves remarkable performance. Our experimental results demonstrate that a general-purpose reasoning model can be effectively and efficiently post-trained into a professional molecular language model through a well-structured multi-task curriculum. Leveraging these capabilities, we further apply the model to multi-step retrosynthetic planning, achieving state-of-the-art performance on RetroBench and demonstrating its superior efficacy as an end-to-end retrosynthetic planner. We anticipate that our approach can be extended to other biomedical scientific domains.

Chenyang Zuo, Siqi Fan, Zaiqing Nie• 2025

Related benchmarks

TaskDatasetResultRank
Multi property optimizationMulti-property Optimization
Average Score95.2
11
Molecular Component EditingMolecular Component Editing
Average Success Rate74.2
9
Single Property OptimizationSingle Property Optimization (test)
Average Score77.2
9
Property Prediction (Classification)MoleculeNet BBBP ClinTox
Avg. Accuracy90.4
8
Chemical Reaction PredictionChemical Reaction
Avg. EM49.8
8
Description-guided Molecular GenerationDescription-guided Generation
EM29.6
8
Forward synthesisChemical Reaction
Exact Match (EM)67.2
8
Molecular Name ConversionName Conversion
Avg EM77.1
8
Molecule CaptioningMolecule Captioning
METEOR0.515
8
Property Prediction (Regression)MoleculeNet ESOL, LIPO
Average Regression RMSE0.945
8
Showing 10 of 11 rows

Other info

Follow for update