Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

About

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.

Qingchuan Zhang, He Cao, Hao Li, Yanjun Shao, Zhiyuan Liu, Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye• 2026

Related benchmarks

TaskDatasetResultRank
Molecular Docking Score OptimizationTarget proteins (PARP1, FA7, 5HT1B, BRAF, JAK2) (novel top 5% molecules)
Docking Score (kcal/mol)-12.07
38
Goal-directed Lead OptimizationLead Optimization Docking Targets parp1 fa7 5ht1b braf jak2 delta=0.6
Docking Score (kcal/mol)-13.37
33
ADMET property optimizationPrompt-MolOpt
ESOL Score0.934
12
Molecular OptimizationPMO-1K
Aggregate Score (22 Tasks)12.42
8
Molecular property optimization ranking and generationChemCoTBench
LogP Delta1.02
8
Molecular OptimizationQED-DRD2 delta=0.4
Success Rate40
7
Molecular OptimizationQED-DRD2 delta=0.5
Success Rate36.76
7
Molecular OptimizationQED-DRD2 delta=0.6
Success Rate (%)30.41
7
Showing 8 of 8 rows

Other info

Follow for update