SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules
About
Large Language Models (LLMs) are central to the one-for-all intelligent paradigm, but they face a fundamental challenge when dealing with heterogeneous scientific data such as molecules: the inherent gap between discrete linguistic symbols and topological molecular or continuous reaction data leads to significant information loss and semantic noise in text-based reasoning. We propose SciCore-Mol, a modular framework that bridges this gap through three deeply integrated pluggable cognitive modules: a topology-aware perception module, a latent diffusion-based molecular generation module, and a reaction-aware reasoning module. Each module is coupled to the LLM backbone through learned representation interfaces, enabling richer information exchange than is possible with text-only tool feedback. Our experiments on diverse chemical tasks demonstrate that SciCore-Mol achieves strong comprehensive performance across molecular understanding, generation, reaction prediction, and general chemistry knowledge, with an 8B-parameter open-source system that is competitive with and in several dimensions surpasses proprietary large models. This work provides a systematic blueprint for equipping LLMs with scientific expertise through decoupled, pluggable, and flexibly orchestrated modules, with direct implications for drug design, chemical synthesis, and broader scientific discovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Generation | SMolInstruct | RDK-FTS (%)70 | 14 | |
| Property Prediction | SMolInstruct ESOL | ESOL RMSE1.73 | 7 | |
| Retrosynthesis | SMolInstruct | RDK-FTS (%)63.7 | 7 | |
| Yield Prediction | ORD (test) | MAE0.27 | 7 | |
| Captioning | SMolInstruct | METEOR37.9 | 7 | |
| Chemical reasoning and prediction | ChemBench4K | Product92 | 7 | |
| Name Conversion | SMolInstruct | I2S FTS Score71.9 | 7 | |
| Product + Yield Prediction | ORD (test) | Valid97.6 | 7 | |
| Property Prediction | SMolInstruct Lipo | Lipo RMSE1.27 | 7 | |
| Property Prediction | SMolInstruct ClinTox | ClinTox Accuracy71.5 | 7 |