A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
About
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Herb-Herb Interaction (HHI) Prediction | ITCM (TCMM) | Accuracy66.23 | 57 | |
| Drug-Drug Interaction prediction | DDInter Target (Source: ZhangDDI) 2.0 | F1 Score62.35 | 38 | |
| Drug-Drug Interaction prediction | DrugMap Target (Source: ZhangDDI) 2024 | F1 Score85.5 | 38 | |
| Drug-Drug Interaction prediction | ZhangDDI | Accuracy49.63 | 36 | |
| Molecular Interaction Prediction | CombiSolv | RMSE0.582 | 29 | |
| Drug-Drug Interaction prediction | DrugMap 2024 | Accuracy60.97 | 19 | |
| Human-Herb Interaction | TCMM (Target) | F1 Score50.26 | 19 | |
| Drug-Drug Interaction prediction | DDInter 2.0 | Accuracy53.36 | 19 | |
| Human-Herb Interaction | ITCM (Target) | F1 Score59.63 | 19 | |
| Drug-Drug Interaction | ZhangDDI Target | F1 Score56.02 | 19 |