Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Are LLMs Ready for Neural-integrated Mechanistic Modeling? A Benchmark and Agentic Framework

About

Large language models (LLMs) have shown promise in constructing mechanistic models from data. However, existing evaluations largely focus on simplified settings and fail to capture the complexity of real-world scientific modeling. In practice, such modeling often involves neural-integrated formulations, where a mechanistic model component and a neural network component are jointly constructed, leading to a significantly more complex search space. Motivated by this gap, we introduce the Neural-Integrated Mechanistic Modeling (NIMM) benchmark, which evaluates LLM-generated neural-integrated mechanistic models across three scientific domains. Experiments on NIMM reveal that existing LLM-based approaches struggle to effectively explore this complex space, resulting in limited search stability and solution quality. To address this challenge, we propose NIMMGen, a tree-guided agentic framework that enables diversified exploration via branch-level search and improves solutions through atomic model refinement. Extensive experiments demonstrate that NIMMGen achieves state-of-the-art performance on NIMM, significantly improving search stability and solution quality.

Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti• 2026

Related benchmarks

TaskDatasetResultRank
Spatial-temporal ForecastingCOVID-Bogota
RMSE333.9
9
Spatial-temporal ForecastingCOVID-Medellin
RMSE524.3
9
Spatial-temporal ForecastingInfluenza-USA
RMSE4.81
9
Spatial-temporal ForecastingMRSA-Virginia
RMSE39.83
9
Clinical health forecastingLung cancer
RMSE1.41
3
Clinical health forecastingLung Cancer w/ Chemo.
RMSE0.08
3
Clinical health forecastingLung Cancer w/ Chemo. & Radio.
RMSE0.06
3
Yield strength predictionFCC High-Entropy Alloys (HEAs) room temperature
RMSE139.2
2
Yield strength predictionBCC High-Entropy Alloys (HEAs) temperature-dependent
RMSE180.1
2
Showing 9 of 9 rows

Other info

Follow for update