Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation

About

Recently, Large Language Models (LLMs) have demonstrated great potential in natural language-driven molecule discovery. However, existing datasets and benchmarks for molecule-text alignment are predominantly built on one-to-one mappings, measuring LLMs' ability to retrieve a single, pre-defined answer, rather than their creative potential to generate diverse, yet equally valid, molecular candidates. To address this critical gap, we propose Speak-to-Structure (S^2-Bench), the first benchmark to evaluate LLMs in open-domain natural language-driven molecule generation. S^2-Bench is specifically designed for one-to-many relationships, challenging LLMs to exhibit genuine molecular understanding and open-ended generation capabilities. Our benchmark includes three key tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom), each probing a different aspect of molecule discovery. We also introduce OpenMolIns, a large-scale instruction tuning dataset that enables Llama3.1-8B to surpass the most powerful LLMs like GPT-4o and Claude-3.5 on S^2-Bench. Our comprehensive evaluation of 31 LLMs shifts the focus from simple pattern recall to realistic molecular design, paving the way for more capable LLMs in natural language-driven molecule discovery. Our codes and datasets are fully accessible through the Github Repository: https://github.com/phenixace/S2-TOMG-Bench and Huggingface Datasets: https://huggingface.co/datasets/phenixace/S2-TOMG-Bench.

Jiatong Li, Junxian Li, Weida Wang, Yunqing Liu, Changmeng Zheng, Yatao Bian, Dongzhan Zhou, Xiao-yong Wei, Qing Li• 2024

Related benchmarks

TaskDatasetResultRank
Molecular Optimization (QED)TOMG-Bench
Success Rate (SR)57.86
39
Molecular Optimization (LogP)TOMG-Bench
Success Rate (SR)80.54
39
Molecular Optimization (MR)TOMG-Bench
Success Rate (SR)78.76
39
Single Property OptimizationSingle Property Optimization (test)
Average Score68
9
Molecular Component EditingMolecular Component Editing
Average Success Rate54.5
9
Showing 5 of 5 rows

Other info

Code

Follow for update