Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

About

Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, $\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at $\href{https://github.com/QizhiPei/BioT5}{Github}$.

Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan• 2023

Related benchmarks

TaskDatasetResultRank
Molecule CaptioningChEBI-20 (test)
BLEU-40.556
107
Molecular property predictionBACE (test)
ROC-AUC89.4
65
molecule property predictionMoleculeNet (scaffold split)
BBBP77.7
58
Text-guided molecule generationChEBI-20 (test)
MACCS FTS Similarity88.6
48
Molecule Description GenerationChEBI-20 (test)
BLEU-266.6
34
ClassificationMoleculeNet BBBP (test)
ROC AUC0.777
30
Description-guided molecule designChEBI-20 2022 (test)
Exact Match Accuracy41.3
26
molecule property predictionHIV MoleculeNet (test)
AUROC81
24
Molecule Description GenerationChEBI-20 2022 (test)
BLEU-20.635
20
Protein-Protein Interaction predictionHuman PPI
Accuracy86.22
18
Showing 10 of 24 rows

Other info

Code

Follow for update