Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces
About
Generating novel molecules with higher properties than the training space, namely the out-of-distribution generation, is important for de novo drug design. However, it is not easy for distribution learning-based models, for example diffusion models, to solve this challenge as these methods are designed to fit the distribution of training data as close as possible. In this paper, we show that Bayesian flow network, especially ChemBFN model, is capable of intrinsically generating high quality out-of-distribution samples that meet several scenarios. A reinforcement learning strategy is added to the ChemBFN and a controllable ordinary differential equation solver-like generating process is employed that accelerate the sampling processes. Most importantly, we introduce a semi-autoregressive strategy during training and inference that enhances the model performance and surpass the state-of-the-art models. A theoretical analysis of out-of-distribution generation in ChemBFN with semi-autoregressive approach is included as well.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Generation | 5ht1b | Docking Score (Top-Hit 5%, kcal/mol)-12.609 | 27 | |
| Molecular Generation | parp1 | Top-Hit 5% Docking Score (kcal/mol)-12.455 | 27 | |
| Molecular Generation | fa7 | Top-Hit 5% Docking Score (kcal/mol)-9.527 | 27 | |
| Molecular Generation | jak2 | Top-Hit 5% Docking Score (kcal/mol)-11.69 | 27 | |
| Molecular Generation | braf | Top-Hit 5% Docking Score (kcal/mol)-12.061 | 26 | |
| Molecular Generation | fa7 | Novel Hit Ratio585.3 | 10 | |
| Molecular Generation | braf | Novel Hit Ratio534 | 10 | |
| Molecular Generation | parp1 | Novel Hit Ratio559.3 | 10 | |
| Molecular Generation | jak2 | Novel Hit Ratio526 | 10 | |
| Molecular Generation | 5ht1b | Novel Hit Ratio4.587 | 10 |