ChemATP: A Training-Free Chemical Reasoning Framework for Large Language Models

About

Large Language Models (LLMs) exhibit strong general reasoning but struggle in molecular science due to the lack of explicit chemical priors in standard string representations. Current solutions face a fundamental dilemma. Training-based methods inject priors into parameters, but this static coupling hinders rapid knowledge updates and often compromises the model's general reasoning capabilities. Conversely, existing training-free methods avoid these issues but rely on surface-level prompting, failing to provide the fine-grained atom-level priors essential for precise chemical reasoning. To address this issue, we introduce ChemATP, a framework that decouples chemical knowledge from the reasoning engine. By constructing the first atom-level textual knowledge base, ChemATP enables frozen LLMs to explicitly retrieve and reason over this information dynamically. This architecture ensures interpretability and adaptability while preserving the LLM's intrinsic general intelligence. Experiments show that ChemATP significantly outperforms training-free baselines and rivals state-of-the-art training-based models, demonstrating that explicit prior injection is a competitive alternative to implicit parameter updates.

Mingxu Zhang, Dazhong Shen, Qi Zhang, Ying Sun• 2025

Related benchmarks

Task	Dataset	Result
Molecular property prediction	BACE	ROC-AUC89.7	73
Molecular property prediction	BBBP	ROC AUC0.8731	59
Molecular property prediction	ClinTox	ROC AUC88.56	58
Molecular property prediction	Tox21	ROC AUC80.23	47
Regression	FreeSolv	RMSE1.0177	45
Molecular Classification	HIV	ROC-AUC72.7	42
Molecular property prediction	SIDER	ROC AUC0.7179	32
Regression	MoleculeNet LIPO	RMSE0.7143	30
Regression	ESOL	RMSE0.6504	9
Regression	Caco2	RMSE0.5041	7

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord