MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

About

General-purpose large language models (LLMs) that rely on in-context learning do not reliably deliver the scientific understanding and performance required for drug discovery tasks. Simply increasing model size or introducing reasoning tokens does not yield significant performance gains. To address this gap, we introduce the MMAI Gym for Science, a one-stop shop molecular data formats and modalities as well as task-specific reasoning, training, and benchmarking recipes designed to teach foundation models the 'language of molecules' in order to solve practical drug discovery problems. We use MMAI Gym to train an efficient Liquid Foundation Model (LFM) for these applications, demonstrating that smaller, purpose-trained foundation models can outperform substantially larger general-purpose or specialist models on molecular benchmarks. Across essential drug discovery tasks - including molecular optimization, ADMET property prediction, retrosynthesis, drug-target activity prediction, and functional group reasoning - the resulting model achieves near specialist-level performance and, in the majority of settings, surpasses larger models, while remaining more efficient and broadly applicable in the domain.

Maksim Kuznetsov, Zulfat Miftahutdinov, Rim Shayakhmetov, Mikolaj Mizera, Roman Schutski, Bogdan Zagribelnyy, Ivan Ilin, Nikita Bondarev, Thomas MacDougall, Mathieu Reymond, Mihir Bafna, Kaeli Kaymak-Loveless, Eugene Babin, Maxim Malkov, Mathias Lechner, Ramin Hasani, Alexander Amini, Vladimir Aladinskiy, Alex Aliper, Alex Zhavoronkov• 2026

Related benchmarks

Task	Dataset	Result
Single-step retrosynthesis	USPTO-50k (test)	--	33
Single-step retrosynthesis	URSA expert 2026	Unique Rate94	21
ADMET Properties Prediction	TDC AMES	AUROC0.805	20
Functional Group Reasoning (Binary Classification)	FGBench Single	Accuracy84.1	16
Functional Group Reasoning (Binary Classification)	FGBench Interaction	Accuracy81.9	16
Functional Group Reasoning (Binary Classification)	FGBench Comparison	Accuracy81	16
Functional Group Reasoning (Numeric Regression)	FGBench Single	RMSE55.954	16
Functional Group Reasoning (Numeric Regression)	FGBench Interaction	RMSE25.046	16
Functional Group Reasoning (Numeric Regression)	FGBench Comparison	RMSE48.344	16
ADMET Properties Prediction	TDC Bioavailability Ma	AUROC75.2	15

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord