Deep Learning Foundation Models from Classical Molecular Descriptors

About

Fast and accurate data-driven prediction of molecular properties is pivotal to scientific advancements across myriad chemical domains. Deep learning methods have recently garnered much attention, despite their inability to outperform classical machine learning methods when tested on practical, real-world benchmarks with limited training data. This study seeks to bridge this gap with CheMeleon, a O(10M) parameter foundation model that enables directed message-passing neural networks to finally exceed the performance of classical methods. Evaluated on 58 benchmark datasets from Polaris and MoleculeACE, CheMeleon achieves a win rate of 75% on Polaris tasks, outperforming baselines like Random Forest (68%), fastprop (36%), and Chemprop (32%), and a 97% win rate on MoleculeACE assays, surpassing Random Forest (50%) and other foundation models. Unlike conventional pre-training approaches that rely on noisy experimental data or biased quantum mechanical simulations, CheMeleon utilizes low-noise molecular descriptors to learn rich and highly transferable molecular representations, suggesting a new avenue for foundation model pre-training.

Jackson W. Burns, Akshat Shirish Zalte, Charlles R. A. Abreu, Jochen Sieg, Christian Feldmann, Miriam Mathea, William H. Green• 2025

Related benchmarks

Task	Dataset	Result
Molecular property prediction	Polaris & MoleculeACE Aggregate (58 tasks)	Win Count24	52
Chemical Property Prediction	Polymers (5-fold cross-val)	Eea R2 Score0.91	50
Chemical Property Prediction	Fuels (10-fold cross-val)	RMSE7.35	48
Polymer-Solvent Interaction Prediction	PolySolv (10-fold cross-validation)	R2 Score (χ)0.85	46
Molecular Property Prediction (Regression)	MoleculeNet (test)	ESOL Error2.417	30
Molecular activity prediction	MoleculeACE (Aggregate)	Win Count11	27
Fuel property prediction	Fuel RON (10-fold cross-validation)	RMSE9.48	24
Chemical Property Prediction	11 Engineering Datasets	Win Count10	22
Molecular property prediction	TDC and MoleculeNet	AMES Score0.76	13

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord