Navigating Chemical Space with Latent Flows
About
Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Property Manipulation | 1,000 unseen molecules (test) | Ranking3 | 9 | |
| Molecular Property Optimization | Unconstrained Molecular Optimization plogP | Mean plogP3.405 | 8 | |
| Molecular Property Optimization | Unconstrained Molecular Optimization QED | Mean QED0.911 | 8 | |
| Molecular Property Optimization | Unconstrained Molecular Optimization ESR1 Docking | Mean Docking Score-9.63 | 8 | |
| Molecular Property Optimization | Unconstrained Molecular Optimization ACAA1 Docking | Mean Docking Score-8.813 | 8 | |
| Unconstrained ACAA1 Docking Score Minimization | ZINC250k Latent Space | Docking Score (1st)-10.48 | 8 | |
| Unconstrained plogP Maximization | ZINC250k Latent Space | 1st Score5.3 | 8 | |
| Unconstrained QED Maximization | ZINC250k Latent Space | Rank 1 Score0.947 | 8 | |
| Unconstrained ESR1 Docking Score Minimization | ZINC250k Latent Space | ESR1 Docking Score (Run 1)-11.05 | 8 |