Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MolDA: Molecular Understanding and Generation via Large Language Diffusion Model

About

Large Language Models (LLMs) have significantly advanced molecular discovery, but existing multimodal molecular architectures fundamentally rely on autoregressive (AR) backbones. This strict left-to-right inductive bias is sub-optimal for generating chemically valid molecules, as it struggles to account for non-local global constraints (e.g., ring closures) and often accumulates structural errors during sequential generation. To address these limitations, we propose MolDA (Molecular language model with masked Diffusion with mAsking), a novel multimodal framework that replaces the conventional AR backbone with a discrete Large Language Diffusion Model. MolDA extracts comprehensive structural representations using a hybrid graph encoder, which captures both local and global topologies, and aligns them into the language token space via a Q-Former. Furthermore, we mathematically reformulate Molecular Structure Preference Optimization specifically for the masked diffusion. Through bidirectional iterative denoising, MolDA ensures global structural coherence, chemical validity, and robust reasoning across molecule generation, captioning, and property prediction.

Seohyeon Shin, HanJun Choi, Jun-Hyung Park, Hong Kook Kim, Mansu Kim• 2026

Related benchmarks

TaskDatasetResultRank
Molecule CaptioningChEBI-20 (test)
METEOR0.239
114
Forward reaction predictionMol-Instructions
Exact Match66.2
30
RetrosynthesisMol-Instructions
Exact Match23.6
30
Reagent PredictionMol-Instructions
Exact Match2.7
30
Property Prediction (Classification)MoleculeNet
HIV AUC76.1
6
Property Prediction (Regression)MoleculeNet
LogD1.923
6
Molecular GenerationChEBI-20
Exact Match6.8
6
Showing 7 of 7 rows

Other info

Follow for update