MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents
About
We present MolLingo, a multi-agent system that emulates the reasoning process of a chemist to automate molecular design. Existing LLM-based approaches either operate as standalone generative models without access to external tools or lack the multi-agent coordination and shared memory needed for iterative, evidence-driven reasoning across the molecular design pipeline. MolLingo addresses this by coordinating a Literature Agent, a Chemist Agent, and an Orchestrator through a shared memory module, with each agent equipped with domain-specific tools. To enable effective molecular reasoning, we introduce BRICS-based Fragment Enumeration (BFE), a synthesis-aware molecular fragmentation method that decomposes molecules into chemically meaningful building blocks represented as block-based SMILES paired with common chemical names. This representation bridges molecular structure and LLM semantic space, enabling block-level reasoning and editing that is difficult with raw SMILES alone. As a case study in early-stage therapeutic design, MolLingo further grounds the Chemist Agent's reasoning in binding site geometry and residue-level protein context derived from molecular docking to optimize molecules for stronger target binding. Across four benchmarks, MolLingo consistently outperforms frontier LLMs and specialized baselines, including a fourfold docking score improvement over GPT-5.4 despite using the same underlying model, consistent drug property optimization gains across multiple LLM backbones, and state-of-the-art results on TOMG-Bench, surpassing both frontier LLMs and the RL-based optimization method RePO. Our results suggest that LLMs are already capable molecular design assistants when guided through chemically meaningful representations and biologically grounded structural context. Code is available at: https://anonymous.4open.science/status/MolLingo-7450.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular Optimization (LogP) | TOMG-Bench | Success Rate (SR)96.8 | 39 | |
| Molecular Optimization (MR) | TOMG-Bench | Success Rate (SR)99.8 | 39 | |
| Molecular Optimization (QED) | TOMG-Bench | Success Rate (SR)76.5 | 39 | |
| Molecular Optimization | DILI TDC (test) | Improvement (%)18.451 | 8 | |
| Molecular Optimization | hERG withdrawn drugs | Improvement (%)5.37 | 8 | |
| Hit-to-lead progression | Hit-to-lead 30 protein targets (test) | Validity100 | 7 | |
| Hit Identification | AKT1 (top 30 hits) | Docking Score-6.368 | 2 | |
| Hit Identification | CDK2 (top 30 hits) | Docking Score-9.308 | 2 |