Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

About

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. Mol-Instructions encompasses three key components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions. Each component aims to improve the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on LLMs, we demonstrate the effectiveness of Mol-Instructions in enhancing large models' performance in the intricate realm of biomolecular studies, thus fostering progress in the biomolecular research community. Mol-Instructions is publicly available for ongoing research and will undergo regular updates to enhance its applicability.

Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen• 2023

Related benchmarks

TaskDatasetResultRank
Molecular property predictionQM9 (test)--
174
Molecule CaptioningChEBI-20 (test)
BLEU-40.1995
107
Text-guided molecule generationChEBI-20 (test)
MACCS FTS Similarity41.2
48
Molecular Property ClassificationMoleculeNet BBBP
ROC AUC58
41
Molecular Property ClassificationMoleculeNet BACE
ROC AUC41.7
36
Molecule Description GenerationChEBI-20 (test)
BLEU-20.249
34
Molecular Property ClassificationMoleculeNet ClinTox
ROC-AUC47.8
27
RetrosynthesisMol-Instructions
Exact Match6.9
24
Reagent PredictionMol-Instructions
Exact Match4.4
24
Forward reaction predictionMol-Instructions
Exact Match5.2
24
Showing 10 of 57 rows

Other info

Follow for update