Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces

About

Generating novel molecules with higher properties than the training space, namely the out-of-distribution generation, is important for de novo drug design. However, it is not easy for distribution learning-based models, for example diffusion models, to solve this challenge as these methods are designed to fit the distribution of training data as close as possible. In this paper, we show that Bayesian flow network, especially ChemBFN model, is capable of intrinsically generating high quality out-of-distribution samples that meet several scenarios. A reinforcement learning strategy is added to the ChemBFN and a controllable ordinary differential equation solver-like generating process is employed that accelerate the sampling processes. Most importantly, we introduce a semi-autoregressive strategy during training and inference that enhances the model performance and surpass the state-of-the-art models. A theoretical analysis of out-of-distribution generation in ChemBFN with semi-autoregressive approach is included as well.

Nianze Tao, Minori Abe• 2024

Related benchmarks

TaskDatasetResultRank
Molecular Docking Score OptimizationTarget proteins (PARP1, FA7, 5HT1B, BRAF, JAK2) (novel top 5% molecules)--
38
Molecular Generation5ht1b
Docking Score (Top-Hit 5%, kcal/mol)-12.609
29
Molecular Generationparp1
Top-Hit 5% Docking Score (kcal/mol)-12.455
29
Molecular Generationfa7
Top-Hit 5% Docking Score (kcal/mol)-9.527
29
Molecular Generationjak2
Top-Hit 5% Docking Score (kcal/mol)-11.69
29
Molecular Generationbraf
Top-Hit 5% Docking Score (kcal/mol)-12.061
28
Molecular Generationfa7
Novel Hit Ratio585.3
21
Molecular Generationjak2
Novel Hit Ratio526
21
Molecular Generation5ht1b
Novel Hit Ratio4.587
21
Molecular Generationbraf
Novel Hit Ratio534
12
Showing 10 of 13 rows

Other info

Follow for update