Controllable Molecular Generative Foundation Models
About
Despite the success of foundation models in language and vision, molecular graph generation still lacks a unified framework for heterogeneous design tasks with reliable controllability. While reinforcement learning (RL) offers a natural post-training mechanism for task-specific optimization, applying it to graph generative models is hindered by the vast atom-wise action spaces and chemically invalid intermediate states. We propose \textbf{Co}ntrollable \textbf{Mole}cular Generative Foundation Models (CoMole), built with a unified motif-aware graph diffusion pipeline. By learning a motif-aware graph space, CoMole transfers pretrained structural priors into controllable generation, where RL optimizes conditional reverse policies over chemically meaningful decisions. We theoretically characterize the bottleneck of atom-level RL and justify motif-aware policy optimization. Across three heterogeneous benchmarks spanning materials and drug discovery, CoMole ranks first in controllability on all nine targets, reduces MAE by up to 48.2% relative to the strongest baselines, and maintains validity above 0.94 without rule-based correction or post-hoc filtering. We further show that CoMole transfers controllability to unseen properties by optimizing only task embeddings with the generator frozen, achieving performance competitive with strong task-specific baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Controllable Molecular Generation | Molecular and Polymer properties 9 properties aggregation (test) | Average Rank1 | 27 | |
| Conditional molecular generation | 10K Polymers (test) | Validity98.83 | 14 | |
| Heterogeneous Conditional Molecular Generation | 10K Polymers | Validity96.88 | 14 | |
| Heterogeneous Conditional Molecular Generation | 10K Molecules Drug-related task set | Validity96.68 | 14 | |
| Molecule Generation | Polymer and Drug datasets (test) | Novelty93.9 | 14 | |
| Controllable Molecular Generation | DFT unseen targets: Ei, EPS (test) | Validity91.82 | 5 |