FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

About

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.

Qingchuan Zhang, He Cao, Hao Li, Yanjun Shao, Zhiyuan Liu, Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye• 2026

Related benchmarks

Task	Dataset	Result
Molecular Docking Score Optimization	Target proteins (PARP1, FA7, 5HT1B, BRAF, JAK2) (novel top 5% molecules)	Docking Score (kcal/mol)-12.07	38
Goal-directed Lead Optimization	Lead Optimization Docking Targets parp1 fa7 5ht1b braf jak2 delta=0.6	Docking Score (kcal/mol)-13.37	33
ADMET property optimization	Prompt-MolOpt	ESOL Score0.934	12
Molecular Optimization	PMO-1K	Aggregate Score (22 Tasks)12.42	8
Molecular property optimization ranking and generation	ChemCoTBench	LogP Delta1.02	8
Molecular Optimization	QED-DRD2 delta=0.4	Success Rate40	7
Molecular Optimization	QED-DRD2 delta=0.5	Success Rate36.76	7
Molecular Optimization	QED-DRD2 delta=0.6	Success Rate (%)30.41	7

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord