Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

About

Molecule discovery is a pivotal research field, impacting everything from medicine to materials. Recently, Large Language Models (LLMs) have been widely adopted in molecular understanding and generation, serving as a bridge between the molecular space and the natural language space, yet the alignment between molecules and their corresponding captions remains a significant challenge. Previous endeavors typically treat molecules as monolithic inputs, lacking an intermediate reasoning process and sacrificing explainability. In this work, we define fine-grained alignments as the precise correspondence between a molecule's sub-structures and the textual phrases that explain their properties. These alignments are crucial for LLMs to understand molecules in a more accurate and explainable manner. Normally, such fine-grained alignments require expert annotation, which is both costly and time-consuming. To allow LLMs to automatically label and learn the fine-grained alignments, we propose MolReFlect, a novel teacher-student framework, where a teacher LLM first generates and refines mappings between caption phrases and SMILES substructures and then explicitly teaches these detailed alignments to a student LLM. Experimental results demonstrate that MolReFlect enables LLMs to significantly outperform previous baselines, achieving the state-of-the-art performance in the molecule-caption translation task. Our codes are available via: https://github.com/phenixace/MolReFlect.

Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Le, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, Qing Li• 2024

Related benchmarks

TaskDatasetResultRank
Molecular Property ClassificationMoleculeNet BBBP
ROC AUC89.25
59
Caption-to-molecule generationChEBI-20
Exact Match51
19
Mol2CapChEBI-20
BLEU-267.6
6
Cap2MolPubChem
BLEU76.32
3
molecule property predictionMoleculeNet BACE
ROC-AUC0.8795
3
Molecule-to-Caption TranslationPubChem
BLEU-20.414
3
Showing 6 of 6 rows

Other info

Follow for update