FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization
About
Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Docking Score Optimization | Target proteins (PARP1, FA7, 5HT1B, BRAF, JAK2) (novel top 5% molecules) | Docking Score (kcal/mol)-12.07 | 38 | |
| Goal-directed Lead Optimization | Lead Optimization Docking Targets parp1 fa7 5ht1b braf jak2 delta=0.6 | Docking Score (kcal/mol)-13.37 | 33 | |
| ADMET property optimization | Prompt-MolOpt | ESOL Score0.934 | 12 | |
| Molecular Optimization | PMO-1K | Aggregate Score (22 Tasks)12.42 | 8 | |
| Molecular property optimization ranking and generation | ChemCoTBench | LogP Delta1.02 | 8 | |
| Molecular Optimization | QED-DRD2 delta=0.4 | Success Rate40 | 7 | |
| Molecular Optimization | QED-DRD2 delta=0.5 | Success Rate36.76 | 7 | |
| Molecular Optimization | QED-DRD2 delta=0.6 | Success Rate (%)30.41 | 7 |