GenMol: A Drug Discovery Generalist with Discrete Diffusion
About
Drug discovery is a complex process that involves multiple stages and tasks. However, existing molecular generative models can only tackle some of these tasks. We present Generalist Molecular generative model (GenMol), a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios. GenMol generates Sequential Attachment-based Fragment Embedding (SAFE) sequences through non-autoregressive bidirectional parallel decoding, thereby allowing the utilization of a molecular context that does not rely on the specific token ordering while having better sampling efficiency. GenMol uses fragments as basic building blocks for molecules and introduces fragment remasking, a strategy that optimizes molecules by regenerating masked fragments, enabling effective exploration of chemical space. We further propose molecular context guidance (MCG), a guidance method tailored for masked discrete diffusion of GenMol. GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design. Our code is available at https://github.com/NVIDIA-Digital-Bio/genmol.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Generation | parp1 | Top-Hit 5% Docking Score (kcal/mol)-11.773 | 27 | |
| Molecular Generation | fa7 | Top-Hit 5% Docking Score (kcal/mol)-8.967 | 27 | |
| Molecular Generation | 5ht1b | Docking Score (Top-Hit 5%, kcal/mol)-11.914 | 27 | |
| Molecular Generation | jak2 | Top-Hit 5% Docking Score (kcal/mol)-10.417 | 27 | |
| Molecular Generation | braf | Top-Hit 5% Docking Score (kcal/mol)-11.394 | 26 | |
| De novo small molecule generation | SAFE (test) | Validity96.7 | 22 | |
| Unconditional molecular generation | MOSES | Validity99.7 | 20 | |
| De Novo Molecular Generation | ZINC Curated 22 (test) | Validity (%)0.999 | 17 | |
| Goal-directed molecular optimization | PMO | Albuterol Similarity0.937 | 16 | |
| Molecular Generation | fa7 | #Circles2.3 | 12 |