LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

About

With the emergence of diffusion models as a frontline generative model, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, here we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. By recognizing that the suitable latent space design is the key to the diffusion model performance, we employ a contrastive learning strategy to extract novel feature space from text data that embeds the unique characteristics of the molecule structure. Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark, being one of the first diffusion models that outperforms autoregressive models in textual data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.

Jinho Chang, Jong Chul Ye• 2024

Related benchmarks

Task	Dataset	Result
Text-guided molecule generation	ChEBI-20 (test)	MACCS FTS Similarity97.3	48
Molecular Optimization	Molecular Pharmacology Optimization	Overall Optimization Score0.1326	13
Molecular Property Optimization	TDC 122 synthesis-guaranteed FDA-approved drugs	AMES49	9
Property-Specific Molecule Generation	CYP3A4	SA2.74	9
Property-Specific Molecule Generation	HIA	Scaffold Ability (SA)2.8	9
Property-Specific Molecule Generation	AMES	Scaffold Ability (SA)2.85	9
Property-Specific Molecule Generation	BBBP	SA Score2.89	9
Property-Specific Molecule Generation	DILI	Synthetic Accessibility (SA)2.8	9
Property-Specific Molecule Generation	PGP	SA Score2.87	9
Molecule-to-Text Retrieval	PCDes (test)	Accuracy (Paragraph, 64-way)90.3	8

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord