Molecule Generation by Principal Subgraph Mining and Assembling
About
Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Property optimization | ZINC250k (test) | 1st Order Metric0.948 | 33 | |
| Constrained Property Optimization | ZINC250K | Improvement6.42 | 27 | |
| Molecular Generation | fa7 | Top-Hit 5% Docking Score (kcal/mol)-8.028 | 27 | |
| Molecular Generation | 5ht1b | Docking Score (Top-Hit 5%, kcal/mol)-9.887 | 27 | |
| Molecular Generation | jak2 | Top-Hit 5% Docking Score (kcal/mol)-9.464 | 27 | |
| Molecular Generation | parp1 | Top-Hit 5% Docking Score (kcal/mol)-9.978 | 27 | |
| Molecular Generation | braf | Top-Hit 5% Docking Score (kcal/mol)-9.637 | 26 | |
| Molecular Docking | fa7 | Mean Docking Score-8.028 | 18 | |
| Molecular Docking | 5ht1b | Mean Docking Score-9.887 | 18 | |
| Molecular Docking | parp1 | Mean Docking Score-9.978 | 18 |