Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules

About

Large Language Models (LLMs) are central to the one-for-all intelligent paradigm, but they face a fundamental challenge when dealing with heterogeneous scientific data such as molecules: the inherent gap between discrete linguistic symbols and topological molecular or continuous reaction data leads to significant information loss and semantic noise in text-based reasoning. We propose SciCore-Mol, a modular framework that bridges this gap through three deeply integrated pluggable cognitive modules: a topology-aware perception module, a latent diffusion-based molecular generation module, and a reaction-aware reasoning module. Each module is coupled to the LLM backbone through learned representation interfaces, enabling richer information exchange than is possible with text-only tool feedback. Our experiments on diverse chemical tasks demonstrate that SciCore-Mol achieves strong comprehensive performance across molecular understanding, generation, reaction prediction, and general chemistry knowledge, with an 8B-parameter open-source system that is competitive with and in several dimensions surpasses proprietary large models. This work provides a systematic blueprint for equipping LLMs with scientific expertise through decoupled, pluggable, and flexibly orchestrated modules, with direct implications for drug design, chemical synthesis, and broader scientific discovery.

Yuxuan Chen, Changwei Lv, Yunduo Xiao, Zhongjing Du, Daquan Zhou, Yukun Yan, Zheni Zeng, Zhiyuan Liu• 2026

Related benchmarks

TaskDatasetResultRank
Molecular GenerationSMolInstruct
RDK-FTS (%)70
14
Property PredictionSMolInstruct ESOL
ESOL RMSE1.73
7
RetrosynthesisSMolInstruct
RDK-FTS (%)63.7
7
Yield PredictionORD (test)
MAE0.27
7
CaptioningSMolInstruct
METEOR37.9
7
Chemical reasoning and predictionChemBench4K
Product92
7
Name ConversionSMolInstruct
I2S FTS Score71.9
7
Product + Yield PredictionORD (test)
Valid97.6
7
Property PredictionSMolInstruct Lipo
Lipo RMSE1.27
7
Property PredictionSMolInstruct ClinTox
ClinTox Accuracy71.5
7
Showing 10 of 15 rows

Other info

Follow for update